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ABSTRACT 


This  dissertation  approaches  the  solution  of  optimization  models  with  uncertain  pa¬ 
rameters  by  considering  the  worst-case  value  of  the  uncertain  parameters  during  opti¬ 
mization.  We  consider  three  problems  resulting  from  this  approach:  a  finite  minimax 
problem  (FMX),  a  semi-infinite  minimax  problem  (SMX),  and  a  semi-infinite  min- 
max-min  problem  (MXM).  In  all  problems,  we  consider  nonlinear  functions  with 
continuous  variables.  We  find  that  smoothing  algorithms  for  (FMX)  may  only  have 
sublinear  rates  of  convergence,  but  their  complexity  in  the  number  of  functions  is 
competitive  with  other  algorithms.  We  present  two  new  smoothing  algorithms  with 
novel  precision-adjustment  schemes  for  (FMX).  For  (SMX)  algorithms,  we  present  a 
novel  way  of  expressing  rate  of  convergence  in  terms  of  computational  work  instead 
of  the  typical  number  of  iterations,  and  show  how  the  new  way  allows  for  a  fairer 
comparison  of  different  algorithms.  We  propose  a  new  approach  to  solve  (MXM), 
based  on  discretization  and  reformulation  of  (MXM)  as  a  constrained  finite  minimax 
problem.  Our  approach  is  the  first  to  solve  (MXM)  in  the  general  case  where  the  in¬ 
nermost  feasible  region  depends  on  the  variables  in  the  outer  problems.  We  conduct 
numerical  studies  for  all  three  problems. 
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EXECUTIVE  SUMMARY 


Optimization  problems  with  uncertain  parameters  arise  in  numerous  applications. 
One  possible  approach  to  handle  such  problems  is  to  consider  the  worst-case  value  of 
the  uncertain  parameter  during  optimization.  We  consider  three  problems  resulting 
from  this  approach:  a  finite  minimax  problem  (FMX),  a  semi- infinite  minimax  prob¬ 
lem  (SMX),  and  a  semi-infinite  min- max- min  problem  (MXM).  In  all  problems,  we 
consider  nonlinear  functions  with  continuous  variables.  We  develop  rate  of  conver¬ 
gence  and  complexity  results,  and  propose  algorithms  for  solving  these  optimization 
problems. 

In  (FMX),  we  solve  a  minimax  problem  with  a  finite  number  of  variables  and 
functions.  We  develop  rate  of  convergence  and  complexity  results  of  smoothing  algo¬ 
rithms  for  solving  (FMX)  with  many  functions.  We  find  that  smoothing  algorithms 
may  only  have  sublinear  rates  of  convergence,  but  their  complexity  in  the  number 
of  functions  is  competitive  with  other  algorithms  due  to  small  computational  work 
per  iteration.  We  present  two  smoothing  algorithms  with  novel  precision-adjustment 
schemes  and  carry  out  a  comprehensive  numerical  comparison  with  other  algorithms 
from  the  literature.  We  find  that  the  proposed  algorithms  are  competitive  with  SQP 
algorithms,  and  especially  efficient  for  problem  instances  with  many  variables,  or 
where  a  significant  number  of  functions  are  nearly  active  at  stationary  points. 

The  numerical  results  also  indicate  that  smoothing  with  first-order  gradient 
methods  is  likely  the  only  viable  approach  to  solve  (FMX)  with  a  large  number  of 
functions  and  variables  due  to  memory  issues. 

For  (SMX),  we  solve  a  minimax  problem  with  a  finite  number  of  variables, 
but  an  infinite  number  of  functions.  We  develop  and  compare  the  rate  of  conver¬ 
gence  results  for  various  fixed  and  adaptive  discretization  algorithms,  as  well  as  an 
e-subgradient  algorithm.  We  present  a  novel  way  of  expressing  rate  of  convergence,  in 
terms  of  computational  work  instead  of  the  typical  number  of  iterations.  Hence,  we 


are  able  to  identify  algorithms  that  are  competitive  due  to  low  computational  work  per 
iteration  even  if  they  require  many  iterations.  We  show  that  a  fixed  discretization  al¬ 
gorithm  with  a  quadratically  or  linearly  convergent  algorithm  map  for  the  discretized 
problem  can  achieve  the  same  asymptotic  convergence  rate  attained  by  an  adaptive 
discretization  algorithm.  We  show  that  under  certain  convexity  assumptions,  the  rates 
of  convergence  for  discretization  algorithms  depend  on  the  dimension  of  the  uncertain 
parameters,  while  the  rates  of  convergence  for  e-subgradient  algorithms  are  indepen¬ 
dent  of  the  dimension  of  the  uncertain  parameters  under  certain  convexity-concavity 
assumptions.  This  indicates  that  under  convexity-concavity  assumptions,  discretiza¬ 
tion  algorithms  are  not  competitive  with  e-subgradient  algorithms  for  problems  with 
large  dimensions  of  the  uncertain  parameters,  and  that  conclusion  is  validated  by  our 
numerical  results. 

In  (MXM),  the  variables  in  each  layer  of  the  problem  vary  within  compact 
continuous  sets.  We  consider  two  cases  depending  whether  the  inner  feasible  region 
is  a  constant  set,  which  we  denote  by  (SMXM),  or  depends  on  decision  variables 
of  the  outer  min-max  problem,  which  we  call  the  generalized  semi-infinite  min-max- 
min  problem,  and  denote  by  (GMXM).  We  propose  a  new  approach  to  solve  (MXM), 
based  on  discretization  and  reformulation  of  (MXM)  into  a  constrained  finite  minimax 
problem  with  a  larger  dimensionality  than  the  original  (MXM).  Our  approach  is 
the  first  to  solve  (GMXM)  in  the  literature  and  it  also  solves  (SMXM).  We  apply 
our  approach  on  a  defender-attacker-defender  network  interdiction  problem,  which 
demonstrates  the  viability  of  the  approach. 


I.  INTRODUCTION 

A.  MOTIVATION  AND  BACKGROUND 

Most,  if  not  all,  decisions  in  the  real  world  are  made  under  some  uncertainty, 
for  example,  Apple  needs  to  decide  on  the  plant  capacity  for  manufacturing  the  iPad 
before  knowing  the  demands  for  it,  the  Department  of  Homeland  Security  needs  to 
make  investment  and  operational  decisions  not  knowing  where  and  how  the  next  ter¬ 
rorist  attack  will  occur,  and  almost  everyone  invests  in  stocks  and  bonds  not  knowing 
if  they  will  turn  out  to  be  profitable. 

In  optimization  models,  uncertainty  usually  shows  up  as  uncertain  parameters 
in  the  model  formulation.  A  common  approach  taken  to  handle  the  uncertainty  is  to 
use  the  average  value  of  the  parameter,  or  to  use  its  most-likely  value,  and  then  use 
deterministic  optimization  to  find  an  optimal  solution.  There  are  many  examples  that 
show  that  optimal  solutions  based  on  such  point  estimates  of  uncertain  parameters 
are  not  robust,  i.e.,  small  changes  to  the  parameters  cause  the  previously  optimal 
solution  to  have  a  much  worse  outcome. 

The  importance  of  considering  uncertainty  in  optimization  can  be  seen  by  the 
number  of  techniques  developed  for  it  (Sahinidis,  2004;  Rockafellar,  2007),  among 
which  are  the  tools  of  stochastic  programming  (Shapiro,  Dentcheva,  &  Ruszczyriski, 
2009)  and  stochastic  dynamic  programming  (Powell,  2007).  The  main  challenge  with 
the  technique  is  the  availability  or  even  the  existence  of  the  probability  distribution 
for  certain  parameters. 

An  example  where  no  probability  distribution  exists  is  that  of  an  adversar¬ 
ial  situation,  where  an  adversary  wants  to  maximize  damage  to  you,  or  minimize 
your  ability  to  achieve  certain  objectives.  In  such  problems  it  is  reasonable  to  use  a 
minimax  formulation,  to  minimize  the  worst-case  damage  that  can  be  caused  by  the 
adversary.  Optimizing  our  actions  against  the  worst-case  scenario  is  the  topic  of  this 
dissertation. 
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B.  SCOPE  OF  DISSERTATION 

In  this  dissertation,  we  consider  three  problems  of  increasing  difficulty:  an 
unconstrained  finite  minimax  problem  (FMX),  an  unconstrained  semi-infinite  mini- 
max  problem  (SMX),  and  a  constrained  semi-infinite  min-max-min  problem  (MXM). 
Specifically,  we  develop  rate  of  convergence  and  complexity  results,  as  well  as  algo¬ 
rithms  for  solving  these  problems.  In  all  problems,  we  consider  nonlinear  functions 
with  continuous  variables. 

In  (FMX),  we  solve  a  minimax  problem  with  a  finite  number  of  variables  and 
functions.  There  are  several  approaches  to  solve  (FMX).  We  consider  one  such  ap¬ 
proach,  that  of  smoothing  algorithms.  In  smoothing  algorithms  (see  for  example 
Polak,  Royset,  &  Womersley,  2003;  Polak,  Womersley,  &  Yin,  2008;  Ye,  Liu,  Zhou,  & 
Liu,  2008;  Li,  1992;  Xu,  2001),  we  create  a  smooth  function  that  approximates  the  non- 
differentiablc  pointwise  maximum  function  and  minimize  the  smooth  approximating 
function.  As  noted  in  Polak  et  al.  (2003),  the  key  strength  of  smoothing  algorithms 
is  that  they  convert  minimax  problems  into  simple,  smooth,  and  unconstrained  opti¬ 
mization  problems  that  can  be  solved  using  any  standard  unconstrained  optimization 
algorithms.  While  complexity  and  rate  of  convergence  have  been  studied  extensively 
for  nonlinear  programs  and  minimax  problems  (see  for  example  Nemirovski  &  Yudin, 
1983;  Drezner,  1987;  Wiest  &  Polak,  1991;  Nesterov,  1995;  Ariyawansa  &  Jiang,  2000; 
Nesterov  &  Vial,  2004;  Nesterov,  2004),  the  topics  have  been  largely  overlooked  in  the 
specific  context  of  smoothing  algorithms  for  (FMX).  We  discuss  complexity  and  rate 
of  convergence  for  smoothing  algorithms  for  (FMX),  and  propose  two  new  smooth¬ 
ing  algorithms  to  solve  (FMX).  We  consider  problem  instances  of  (FMX)  with  up  to 
10,000,000  functions  and  up  to  10,000  variables  in  the  numerical  studies  to  compare 
the  new  smoothing  algorithms  with  other  algorithms  from  the  literature. 

For  (SMX),  we  solve  a  minimax  problem  with  a  finite  number  of  variables, 
but  an  infinite  number  of  functions.  The  focus  of  our  research  for  (SMX)  is  on  a 
novel  way  of  expressing  rate  of  convergence  of  algorithms.  Consider  two  (SMX)  al- 
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gorithms,  a  linearly  convergent  algorithm  and  a  superlinearly  convergent  algorithm. 
Since  conventional  rate  of  convergence  do  not  consider  computational  work,  it  is  pos¬ 
sible  for  the  linearly  convergent  algorithm  to  generate  an  iterate  every  second,  while 
the  superlinearly  convergent  algorithm  to  generate  an  iterate  every  hour.  Or  worse 
still,  the  superlinearly  convergent  algorithm  takes  an  hour  to  generate  the  first  it¬ 
erate,  and  the  run  time  to  generate  subsequent  iterates  doubles  at  every  iteration. 
As  mentioned,  this  lack  of  correlation  between  rate  of  convergence  and  run  time  is 
because  conventional  rate  of  convergence  do  not  consider  computational  work.  We 
propose  a  new  way  of  expressing  rate  of  convergence,  which  considers  computational 
work.  We  select  several  (SMX)  algorithms  to  illustrate  how  the  new  way  of  express¬ 
ing  rate  of  convergence  addresses  the  issues  of  the  conventional  way  described  above. 
Specifically,  we  examine  discretization  and  e-subgradient  algorithms.  Discretization 
algorithms  are  one  of  the  more  popular  classes  of  algorithms  for  solving  SIPs  due  to 
their  simplicity.  In  discretization  algorithms,  we  solve  a  sequence  of  finite  minimax 
problems,  where  the  number  of  functions  considered  increases.  Since  the  computa¬ 
tional  work  to  solve  a  finite  minimax  problem  depends  on  the  number  of  functions 
in  the  problem,  a  discretization  algorithm  takes  increasingly  longer  time  to  gener¬ 
ate  an  iterate  as  the  discretization  algorithm  progresses.  An  e-subgradient  algorithm 
does  not  use  discretization  to  solve  (SMX)  and  is  well-known  to  have  a  sublinear  rate 
of  convergence.  Its  run  time  does  not  vary  much  between  iterations.  Compared  to 
the  conventional  way  of  expressing  rate  of  convergence,  we  show  that  the  new  way 
allows  us  to  conduct  a  fairer  comparison  between  the  e-subgradient  algorithm  and 
discretization  algorithms.  We  also  conduct  numerical  studies  to  validate  the  rate-of- 
convergence  results  that  we  obtain. 

In  (MXM),  the  variables  in  each  layer  of  the  problem  vary  within  a  compact 
set  with  uncountable  cardinality.  We  consider  two  cases  depending  whether  the  inner 
feasible  region  is  a  constant  set,  which  we  denote  by  (SMXM),  or  depends  on  decision 
variables  of  the  outer  min-max  problem,  which  we  call  the  generalized  semi-infinite 
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min-max-min  problem,  and  denote  by  (GMXM).  The  problem  (MXM)  is  difficult 
to  solve,  which  explains  the  rather  limited  literature  on  (SMXM),  and  so  far,  there 
is  no  solution  approach  for  (GMXM).  We  propose  a  new  approach  to  solve  (MXM), 
based  on  discretization  and  reformulation  of  (MXM)  into  a  constrained  finite  minimax 
problem.  We  apply  the  approach  on  a  defender-attacker-defender  network  interdiction 
problem  for  a  10-node  18-arc  network  to  demonstrate  the  viability  of  the  approach. 

C.  CONTRIBUTIONS 

The  main  contributions  of  this  dissertation  are  as  follows.  We  provide  the 
first  complexity  and  rate-of-convergence  analyses  of  smoothing  algorithms  for  solving 
(FMX).  We  develop  two  new  smoothing  algorithms  with  novel  precision-adjustment 
schemes.  We  conduct  a  comprehensive  numerical  comparison  of  our  algorithms  with 
other  algorithms  from  the  literature,  considering  problem  instances  with  the  number 
of  functions  two  orders  of  magnitude  larger  than  problem  instances  considered  in  the 
literature.  The  numerical  results  indicate  that  the  two  new  smoothing  algorithms  are 
competitive  with  the  other  algorithms  compared. 

For  (SMX),  we  present  a  novel  way  of  expressing  rate  of  convergence,  in  terms 
of  computational  work  instead  of  the  typical  number  of  iterations,  which  allows  for  a 
fairer  comparison  of  algorithms.  We  show  that  a  fixed  discretization  algorithm  with 
quadratically  or  linearly  convergent  algorithm  map  can  achieve  the  same  asymptotic 
convergence  rate  in  terms  of  computational  work  as  the  one  attained  by  an  adaptive 
discretization  algorithm.  We  show  that  under  certain  convexity-concavity  assump¬ 
tions,  discretization  algorithms  are  not  competitive  with  e-subgradient  algorithms  for 
problems  with  large  dimension  of  the  uncertain  parameters,  which  we  also  validated 
in  numerical  tests. 
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We  develop  the  first  exact  algorithm  for  (GMXM),  which  also  results  in  a 
novel  approach  for  solving  (SMXM).  If  (MXM)  has  an  objective  function  that  is 
convex  in  the  inner  and  outer  minimization  variables,  and  the  inner  and  outer  feasible 
regions  are  convex,  then  onr  algorithm  guarantees  convergence  to  a  global  minimize!' 
of  (MXM). 

D.  MATHEMATICAL  BACKGROUND 

This  section  defines  notation  and  mathematical  concepts  used  throughout  this 
dissertation.  Throughout  the  dissertation,  Mn  denotes  the  n-dimensional  Euclidean 
space,  N  =  {1,2, No  =  N  U  {0},  |  •  |  represents  the  cardinality  operator,  ||  •  || 
represents  the  Euclidean  norm  operator,  AT  denotes  the  transpose  of  the  matrix  A, 
and  Xi  — >K  x  represents  that  given  a  K  c  N,  for  every  e  >  0,  there  exists  an  i±  G  K 
such  that  | Xi  —  x\  <  e  for  all  i  >  i\,i  G  K.  Other  than  the  above  notation,  which  is 
used  to  denote  the  same  quantities  throughout  this  dissertation,  some  notation  may 
be  used  to  represent  different  quantities  in  different  chapters. 

1.  Continuity  of  Max  Functions 

The  following  results  on  the  continuity  of  the  pointwise  maximum  (applies  to 
minimum  as  well)  function  are  used  repeatedly  throughout  the  dissertation. 

Proposition  1.1.  Suppose  that  the  functions  f]  :  — >■  M,j  G  Q  =  {1,2,  ...,q},q  G 

N,  are  continuous  for  all  x  G  Md,  d  G  N.  Then  the  pointwise  maximum  function 
:  Rd  — y  R  defined  by 

,if(x)=ma.xfj(x)  (1.1) 

j£Q 

is  continuous  for  all  x  G  □ 

Proposition  1.2.  Let  Y  C  Mm  be  a  compact  set,  and  the  functions  <j)(-,y),  where 
(p  :  Rd  x  — >  M,  be  continuous  for  all  y  G  Y  on  Then  the  pointwise  maximum 

function  :  Rd  x  — y  M  defined  by 

ip(x)  =  ma  x(f>(x,y)  (1.2) 

y£Y 
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is  continuous  for  all  x  G  Md.  □ 

The  proofs  for  Propositions  1.1  and  1.2  can  be  found  on  pp.  51  and  187  of 
Demyanov  and  Malozamov  (1974),  repectively. 

2.  Rate  of  Convergence 

Two  key  performance  measures  of  an  optimization  algorithm  are  its  complexity 
and  rate  of  convergence.  We  define  the  different  rates  of  convergence  next,  based  on 
Bertsekas  (1999,  pp.  63-65)  and  Nocedal  and  Wright  (2006,  pp.  619-620). 

Consider  a  sequence  of  points  {xn}^=0  C  converging  to  x*  G  Md.  Rate  of 
convergence  can  be  evaluated  using  an  error  function  en  :  M0'  — >  M,  where  en  >  0  for 
all  n  G  N0  and  en  — >  0  as  n  — >■  oo.  The  two  common-used  error  functions  are  based 
on  Euclidean  distance 

cn  x  ||,  (1.3) 

and  function  values 

en  =  \f(xn)  -  f(x*)\.  (1.4) 

We  say  that  the  convergence  is  sublinear  if 


..  en+ 1 

hmsup - =  1. 

n— »■  oo  &n 

The  convergence  is  linear  if  there  exist  c  G  (0, 1)  and  ri\  G  No  such  that 

en+ 1  , 

-  <  c, 


(1.6) 


(16) 


for  all  n  >  n  \ .  Convergence  is  superlinear  if 


lim  sup  =  0. 

n— »■  oo  &n 


(1.7) 


We  say  that  we  have  order  of  convergence  r  >  1  if  there  exist  a  c  >  0  and  a  ri\  G  N0 
such  that 

&n+ 1 


( en )' 


<  c, 


(1,8) 


for  all  n  >  n i .  When  r  =  2,  we  call  the  convergence  quadratic. 
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If  two  sequences  converge  sublinearly,  we  say  that  they  have  the  same  rate, 
even  if  the  constants  are  different.  Similar  comments  hold  for  linear  and  superlinear 
convergence.  We  say  that  two  sequences  that  are  both  superlinear  converge  at  the 
same  rate.  We  say  that  two  sequences  that  are  both  superlinear  but  with  different 
orders  converge  at  the  same  rate  but  with  different  orders.  We  also  say  that  superlin¬ 
ear  convergence  is  faster  than  linear  convergence,  which  again  is  faster  than  sublinear 
convergence. 

We  say  that  an  algorithm  map  used  to  solve  a  problem  (P)  converges  sub¬ 
linearly,  linearly,  or  superlinearly  if  the  sequence  generated  by  the  algorithm  map 
converges  sublinearly,  linearly,  or  superlinearly,  respectively. 

3.  Consistent  Approximations 

This  subsection  discusses  the  theory  of  consistent  approximations  (Polak,  2003, 
1997).  Consider  the  problem 

(P)  min  f  (x) ,  (1.9) 

x£X 

where  X  C  M0'  and  /  :  M.d  — »  M  is  continuous. 

Next,  given  IV  e  N,  consider  an  approximate  problem  to  (P) 

(PN)  min  /jv(aj),  (I- 10) 

x^Xjsr 

where  Xjy  <zWl  and  :  M0'  — >  M  is  continuous. 

Two  properties  are  required  for  the  approximating  problems  (as  N  — >  oo)  to 
be  consistent  approximations  to  (P).  First,  we  need  the  epi-convergence  of  (Pv)  to 
(P)  as  N  — *  oo.  For  a  detailed  discussion  of  epi-convergence;  see  Polak  (1997,  Section 
3.3.1)  or  Rockafellar  and  Wets  (1998,  Sections  IB,  4B,  &  7B).  We  here  give  essential 
definitions  and  results  for  our  study. 

We  define  the  epigraph 

E=  {{x,z)e  Rd+1  I  xex,z>  f(x)}  .  (1.11) 
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The  set  E  consists  of  all  the  points  in  M(i+1  on  or  above  the  function  /(•).  Similarly, 
the  epigraph 

En=  {{x,z)eRd+1  |  xeXN,z>fN{x)}.  (1.12) 

Epi- convergence  of  (Pat)  to  (P)  as  N  — »  oo  is  then  defined  as  set  convergence  of  the 
epigraphs  E n  to  E,  in  the  sense  of  Painleve-Kuratowski,  as  in  Definition  5.3.6  of 
Polak  (1997).  For  completeness,  we  restate  the  definition  of  set  convergence  in  the 
sense  of  Painleve-Kuratowski. 


Definition  1.1.  Consider  a  sequence  of  sets  {At}fL0  c 

(i)  We  define  the  distance  between  a  point  x  and  a  set  Aj  as 

p(x,  Ai)  =  inf  {||x  —  x||  |  x  G  A^  .  (1.13) 

The  point  a:  is  a  limit  point  of  {Ai}1?! 0  if  p(x,  Ai)  — »  0  as  i  — >  oo  (that  is,  a;  is  a 

limit  point  of  {Aj}°k0  if  there  exist  a  ay  G  A,  for  all  i  G  N,  such  that  Xj  — *  x,  as 
i  —>•  oo). 

(ii)  The  point  a:  is  a  cluster  point  of  {A,}^0  if  it  is  a  limit  point  of  a  subsequence 

(iii)  We  denote  the  set  of  limit  points  of  {Aj}°20  by  liminf  A*,  and  we  denote  the  set 

of  cluster  points  of  {Aj}^0  by  1  im  sup  A,: . 

(iv)  The  sets  Aj  converge  in  the  sense  of  Painleve-Kuratowski  to  the  set  A  as  i  — »  oo 

if  lim  inf  Aj  =  lim  sup  Aj  =  A.  □ 

An  alternate  way  to  prove  epi- convergence  is  provided  by  the  following  propo¬ 
sition,  extracted  from  Polak  (2003,  Theorem  3.1). 

Proposition  1.3.  The  sequence  of  problems  {(PaOItvsn  epi-converges  to  (P)  as  N  — > 
oo  if  and  only  if 

(i)  for  every  x  G  X,  there  exists  a  sequence  {ayv}jveN>  where  ayv  €  XN,  x^  —$■  x  as 
N  — >  oo,  and  limsup /at(^at)  <  /(x); 

(ii)  for  every  infinite  sequence  {xn}ngk,  where  K  C  N,  xjv  G  Xn  for  all  N  G  K, 

and  xtv  ->A  x  as  N  — »  oo,  then  x  G  X  and  liminf  /at(xat)  >  /(x).  □ 


The  importance  of  epi- convergence  is  stated  in  the  next  result,  extracted  from 
Polak  (2003,  Theorem  3.2). 

Proposition  1.4.  Suppose  that  the  sequence  of  problems  {(PaOIvsn  epi-converges  to 
(P)  as  N  — »  oo.  Then  the  following  facts  hold: 

(i)  If  {xn}  is  a,  sequence  of  global  minimizers  of  (P/v)  and  there  exists  an  infinite 
subset  K  G  N  such  that  xn  — >K  x  as  N  — >•  oo,  then  x  is  a  global  minimizer  of 
(P),  and  fN(xN)  — >K  f(x)  as  N  — >•  oo. 

(ii)  If  {xn}  is  a  sequence  of  local  minimizers  of  (Pat)  sharing  a  common  radius  of 
attraction  p  >  0  (i.e.,  for  all  N  G  N,  /jv(^jv)  <  /at  0*0  for  all  x  G  XN  such  that 
||x  —  xn\\  <  p),  and  there  exists  an  infinite  subset  K  G  N  such  that  x n  — >K  x  as 
N  — >  oo,  then  x  is  a  local  minimizer  of  (P),  and  $n(xn )  — >K  f(x)  as  N  — >■  oo. 

□ 

Epi- convergence  does  not  rule  out  the  possibility  that  an  arbitrary  sequence 
of  local  minimizers  of  (Ptv)  may  have  an  accumulation  point  that  is  neither  a  local 
minimizer  nor  a  stationary  point.  To  ensure  that  accumulation  points  of  a  sequence 
of  stationary  points  of  (Pat)  are  stationary  points  of  (P),  a  suitable  characterization  of 
stationarity  is  required,  such  as  the  use  of  optimality  functions  as  defined  by  Definition 
3.3  of  Polak  (2003). 

Definition  1.2.  A  function  9  :  — >  M  is  an  optimality  function  for  (P)  if  (i)  d(-)  is 

upper  semi-continuous,  (ii)  9{x)  <  0  for  all  x  G  M0',  and  (iii)  if  a:  is  a  local  minimizer 
of  (P),  then  9(x)  =  0.  Similarly,  a  function  9n  :  M0'  — ^ >  M  is  an  optimality  function  for 
(Pat)  if  (i)  fhv(-)  is  upper  semi-continuous,  (ii)  9n(x)  <  0  for  all  x  G  and  (iii)  if 
xn  is  a  local  minimizer  of  (Pat),  then  9n(xn )  =  0.  □ 

We  next  define  consistent  approximations,  as  per  Definition  3.4  of  Polak 

(2003). 

Definition  1.3.  The  pairs  ((Pat),  #at(-)),  in  the  sequence  {((Pat),  0at(-))}  are  con~ 
sistent  approximations  to  the  pair  ((P),0(-))  if  (i)  (Pat)  epi-converges  to  (P)  as 
N  ^  oo  and  (ii)  for  any  infinite  sequence  {xn}n&k,  K  C  N  where  xn  — >  x, 
limsup9N(xN)  <  6{x).  □ 
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Consistent  approximations  ensure  that  given  a  sequence  of  approximate  sta¬ 
tionary  points  where  9n(xn)  — >  0  as  N  — >•  oo,  and  xn  — >  x  a,s  N  — >  oo,  then 

6{x)  =  0,  i.e.,  x  is  a  stationary  point  of  (P). 

E.  ORGANIZATION 

The  remainder  of  this  dissertation  is  outlined  as  follows.  Chapter  If  devel¬ 
ops  results  for  rate  of  convergence  and  complexity  for  smoothing  algorithms  to  solve 
(FMX).  We  present  two  new  smoothing  algorithms  with  novel  precision-adjustment 
schemes  and  carry  out  a  comprehensive  numerical  comparison  with  other  algorithms 
from  the  literature.  In  Chapter  III,  we  present  a  novel  way  of  expressing  rate  of  con¬ 
vergence,  in  terms  of  computational  work  instead  of  the  typical  number  of  iterations. 
We  develop  and  compare  rate-of-convergence  results  for  various  fixed  and  adaptive 
discretization  algorithms  as  well  as  an  e-subgradient  algorithm.  In  Chapter  IV,  we 
propose  a  new  approach  to  solve  (MXM).  We  apply  the  approach  to  solve  a  defender- 
attacker-defender  network  interdiction  problem  to  illustrate  the  viability  of  our  new 
approach.  Chapter  V  covers  the  conclusions  and  future  research  opportunities. 


10 


II. 


FINITE  MINIMAX  PROBLEM 


A.  INTRODUCTION 

This  chapter  considers  finite  minimax  problems  of  the  form 

(FMX)  min  ip(x),  (If.l) 

x&Rd 

where  if}  :  — >  M  is  defined  by 

ip(x)  =  max/3(s);-  (II. 2) 

ieQ 

and  f3  :  — y  M,  j  e  Q  =  {1,2, ...,  q},  g  6  N,  are  twice  continuously  differentiable. 
(FMX)  is  “finite”  as  we  consider  a  finite  number  of  functions,  as  compared  to  the 
“semi-infinite”  problems  (SMX)  and  (MXM)  where  we  consider  an  infinite  number 
of  functions.  Finite  minimax  problems  of  the  form  (FMX)  may  occur  in  engineering 
design  (Polak,  1987),  control  system  design  (Polak,  Salcudean,  &  Mayne,  1987),  port¬ 
folio  optimization  (Cai,  Teo,  Yang,  &  Zhou,  2000),  best  polynomial  approximation 
(Demyanov  &  Malozemov,  1974),  or  as  subproblems  in  semi-infinite  minimax  algo¬ 
rithms  (Panier  &  Tits,  1989).  We  focus  on  minimax  problems  with  many  functions, 
i.e. ,  large  q,  which  may  result  from  finely  discretized  semi- infinite  minimax  problems 
or  optimal  control  problems;  see  for  example  Panier  and  Tits  (1989);  Zhou  and  Tits 
(1996).  We  develop  algorithms  for  such  problems  and  analyze  their  efficiency.  An 
abbreviated  version  of  this  chapter  is  published  separately  (Pee  &  Royset,  2010). 

The  non-differentiability  of  the  objective  function  in  (FMX)  poses  the  main 
challenge  for  solving  minimax  problems,  as  standard  unconstrained  optimization  al¬ 
gorithms  do  not  apply  directly.  Many  algorithms  have  been  proposed  to  solve  (FMX); 
see  for  example  Zhou  and  Tits  (1996);  Polak  et  al.  (2003);  Obasanjo  et  al.  (2010)  and 
references  therein.  One  approach  is  sequential  quadratic  programming  (SQP),  where 
(FMX)  is  first  reformulated  into  the  standard  nonlinear  constrained  problem 

(FMX')  min  {z  \  fj(x)  -z<  0  Vj  G  Q}  (II.3) 

(x,z)eRd+1 
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and  then  an  SQP  algorithm  is  applied  to  (FMX'),  advantageously  exploiting  the 
special  structure  in  the  new  formulation  (Zhou  &  Tits,  1996;  Zhu,  Cai,  &  Jian,  2009). 
Other  approaches  also  based  on  (FMX')  include  interior  point  methods  (Sturm  & 
Zhang,  1995;  Obasanjo  et  al.,  2010;  Luksan,  Matonoha,  &  Vlcek,  2005)  and  conjugate 
gradient  methods  in  conjunction  with  exact  penalties  and  smoothing  (Ye  et  ah,  2008). 

Due  to  its  aggressive  active-set  strategy,  the  SQP  algorithm  in  Zhou  and  Tits 
(1996)  appears  especially  promising  for  problems  with  many  sequentially-related  func¬ 
tions  (in  the  sense  that  the  values  taken  by  fJ  ( • )  are  typically  close  to  the  values  taken 
by  /J+1(-)).  as  in  the  case  of  finely  discretized  semi-infinite  minimax  problems.  The 
SQP  algorithm  in  Zhou  and  Tits  (1996)  needs  to  solve  two  quadratic  programs  (QPs) 
in  each  iteration.  Recently,  Zhu  et  al.  (2009)  propose  an  SQP  algorithm  that  requires 
the  solution  of  only  one  QP  per  iteration,  yet  this  algorithm  retains  global  conver¬ 
gence  and  super  linear  rate  of  convergence  as  in  the  algorithm  in  Zhou  and  Tits  (1996). 
Furthermore,  the  algorithm  in  Zhu  et  al.  (2009)  does  not  use  an  active-set  strategy. 
At  a  point  x  G  we  call  a  function  P(-),j  G  Q ,  active  if  P(x)  =  'ip(x),  and  e-active 
(e  >  0)  if  P(x)  >  ij}(x)  —  e.  In  general,  an  active-set  strategy  only  considers  functions 
that  are  e-active  (and  disregards  the  other  functions)  at  the  current  iterate,  and  thus 
greatly  reduces  the  number  of  function  and  gradient  evaluations  at  each  iteration  of 
an  algorithm.  While  the  number  of  iterations  needed  to  solve  a  problem  to  required 
precision  may  increase,  the  overall  effect  may  be  a  reduction  in  the  number  of  func¬ 
tion  and  gradient  evaluations,  and  that  may  translate  into  reduced  computing  times. 
For  example,  Polak  et  al.  (2008)  reports  a  75%  reduction  in  the  number  of  gradient 
evaluations,  and  Zhou  and  Tits  (1996)  reports  reductions  in  computing  times  with 
active-set  strategies. 

In  smoothing  algorithms  (see  for  example  Polak  et  ah,  2003,  2008;  Ye  et  ah, 
2008;  Li,  1992;  Xu,  2001),  we  create  a  smooth  function  (using  exponential  smoothing, 
to  be  discussed  in  Section  II. B)  that  approximates  the  non-differentiable  'ijj(-)  and 
minimize  the  smooth  approximating  function.  We  refer  to  the  resulting  problem 
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of  minimizing  the  smooth  approximating  function  as  a  smoothed  problem.  As  the 
smoothed  problem  remains  unconstrained,  one  can  use  any  standard  unconstrained 
optimization  algorithm,  such  as  the  Armijo  Gradient  or  Newton  methods  (Polak  et  ah, 
2003)  or  a  Quasi-Newton  method  (Polak  et  ah,  2008). 

A  fundamental  challenge  for  smoothing  algorithms  is  that  the  smoothed  prob¬ 
lem  becomes  increasingly  ill-conditioned  as  the  approximation  becomes  more  accu¬ 
rate.  Consequently,  the  use  of  smoothing  techniques  is  complicated  by  the  need  to 
balance  the  accuracy  of  the  approximation  with  problem  ill-conditioning.  The  sim¬ 
plest  smoothing  algorithm  creates  an  accurate  smooth  approximating  function  and 
solve  it.  This  simple  static  scheme  of  constructing  a  single  smoothed  problem  and 
solving  it  is  highly  sensitive  to  the  choice  of  accuracy  and  has  poor  numerical  perfor¬ 
mance  (Polak  et  ah,  2003).  An  attempt  to  address  this  challenge  by  using  a  sequence 
of  smoothed  problems  was  first  made  in  Xu  (2001),  where  a  precision  parameter  that 
controls  approximation  accuracy  is  initially  set  to  a  pre-selected  value  and  then  dou¬ 
bled  at  each  iteration.  Effectively,  in  this  open-loop  scheme  to  precision  adjustment, 
the  algorithm  approximately  solves  a  sequence  of  gradually  more  accurate  approxi¬ 
mations.  This  open-loop  scheme  is  sensitive  to  the  multiplication  factor  (Polak  et  al., 
2003). 

Polak  et  ah  (2003)  propose  an  adaptive  precision-parameter  adjustment  scheme 
that  controls  problem  ill-conditioning  by  keeping  a  smoothing  precision  parameter 
small  when  far  from  a  stationary  solution,  and  increasing  the  parameter  as  a  sta¬ 
tionary  solution  is  approached.  Numerical  results  show  that  the  scheme  manages 
ill-conditioning  better  than  static  and  open-loop  schemes.  The  smoothing  algorithms 
in  Xu  (2001)  and  Polak  et  ah  (2003)  do  not  incorporate  any  active-set  strategy. 

Using  the  adaptive  precision-parameter  adjustment  scheme  in  Polak  et  ah 
(2003),  Polak  et  ah  (2008)  presents  an  active-set  strategy  for  smoothing  algorithms 
that  tackles  (FMX)  with  large  q.  We  note  that  the  convergence  result  in  Theorem 
3.3  of  Polak  et  ah  (2008)  may  be  slightly  incorrect  as  it  claims  stationarity  for  all 
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accumulation  points  of  a  sequence  constructed  by  the  algorithm  in  Polak  et  ah  (2008). 
However,  the  proof  for  Theorem  3.3  of  Polak  et  al.  (2008)  relies  on  Polak  et  al.  (2003), 
which  guarantees  stationarity  for  only  a  single  accumulation  point. 

This  chapter  examines  smoothing  algorithms  for  (FMX)  with  large  q  from  two 
angles.  First,  we  discuss  complexity  and  rate  of  convergence  for  such  algorithms.  We 
define  complexity  as  the  computational  work  of  an  algorithm  on  a  serial  machine  to 
obtain  a  solution  that  is  within  a  specified  error  tolerance  of  the  optimal  solution 
of  a  problem,  expressed  as  a  function  of  the  sizes  of  a  specific  set  of  inputs  for  the 
problem.  While  complexity  and  rate  of  convergence  have  been  studied  extensively 
for  nonlinear  programs  and  minimax  problems  (see  for  example  Nemirovski  &  Yudin, 
1983;  Drezner,  1987;  Wiest  &  Polak,  1991;  Nesterov,  1995;  Ariyawansa  &  Jiang,  2000; 
Nesterov  &  Vial,  2004;  Nesterov,  2004),  the  topics  have  been  largely  overlooked  in  the 
specific  context  of  smoothing  algorithms  for  (FMX).  A  challenge  here  is  the  increasing 
ill-conditioning  of  the  smoothed  problem  as  the  smoothing  precision  improves.  We 
quantify  the  degree  of  ill-conditioning  and  use  this  result  to  analyze  complexity  and 
rate  of  convergence.  We  find  that  the  rate  of  convergence  may  be  sublinear,  but 
low  computational  work  per  iteration  yields  complexity,  as  a  function  of  q,  that  is 
competitive  with  several  other  algorithms. 

Second,  we  consider  implementation  and  numerical  performance  of  smooth¬ 
ing  algorithms.  A  challenge  here  is  to  construct  schemes  for  selecting  the  precision 
parameter  that  guarantee  convergence  to  stationary  points  and  perform  well  em¬ 
pirically.  As  discussed  above,  static  and  open-loop  precision-parameter  adjustment 
schemes  result  in  poor  numerical  performance  and,  thus,  we  develop  two  adaptive 
schemes.  In  extensive  tests  against  other  algorithms,  smoothing  algorithms  with  the 
adaptive  schemes  are  competitive,  and  especially  so  for  problem  with  many  variables, 
or  where  a  significant  number  of  functions  are  nearly  active  at  stationary  points. 


14 


B.  EXPONENTIAL  SMOOTHING 


For  ease  of  analysis  of  active-set  strategies,  we  consider  the  problem 

(FMXJ  min Mx),  (11.4) 

a;6Rd 

where  ipn(x)  =  ma x.jeci  P(x),  and  fl  C  Q.  When  Q  =  Q,  (FMXq)  is  identical  to 
(FMX).  For  simplicity  of  notation,  we  drop  subscripts  Q  in  several  contexts  below. 
Next,  for  any  fl  C  Q  and  for  a  parameter  p  >  0,  we  define  a  smoothed  problem  to 
(FMX^)  by 

(FMXPJ  nhn  ippn(x),  (n-5) 

y  a;eRd 

where 

ipPn(x)  =  -p  log  |^exp  (p.fJ(x))^j 

=  ipn(x)  +  ^log  j  J]exp  (p(f3(x)  -^n(ar)))  )  (II.6) 

P  Vieo  / 

is  an  exponential  penalty  function.  We  denote  (FMXpg)  by  (FMXp)  for  brevity. 
This  smoothing  technique  was  introduced  in  Kort  and  Bertsekas  (1972)  and  used  in 
Polak  et  al.  (2003,  2008);  Ye  et  al.  (2008);  Li  (1992);  Xu  (2001).  The  exponential 
penalty  function  has  been  commonly  used  in  smoothing  algorithms  as  it  preserves 
differentiability  (as  formalized  in  Proposition  11.1)  and  convexity  (Li  &  Fang,  1997). 

We  denote  the  set  of  active  functions  at  x  G  by  Q(x)  =  {j  G  0 1  f^(x)  = 
ijja(x)}.  Except  as  stated  in  Appendix  A,  we  denote  components  of  a  vector  by 
superscripts. 

The  parameter  p  >  0  is  a  smoothing  precision  parameter,  where  a  larger  p 
implies  higher  precision  as  illustrated  in  Figure  1  and  formalized  by  Proposition  II.  1; 
see  for  example  Polak  et  ah  (2008).  In  Figure  1,  =  {1,2,3}  and  the  subscript  “L2” 

has  been  dropped  from  the  notation.  The  numbers  in  the  subscripts  are  p  values. 

Proposition  II.  1.  Suppose  that  L2  C  Q  and  p  >  0. 


15 


Figure  1.  Smoothed  Problems. 


(i)  If  the  functions  f-’(-),  j  G  hi,  are  continuous,  then  ifpn(-)  is  continuous,  and  for 
any  x  G  ifpn(x)  decreases  monotonically  as  p  increases. 

(ii)  For  any  x  G 


log  |f}(>)|  /  ,  f  ^  ^  /  log|^l 

0  <  - <  Vpsi{x)  ~  Wn\x)  <  - 


P 


P 


(IL7) 


where  |  •  |  represents  the  cardinality  operator. 


(Hi)  If  the  functions  P(-),  j  G  are  continuously  differentiable,  then  'f’pn(-)  is 
continuously  differentiable,  with  gradient 

Vifpn(x)  =  Y.^x)Vfj(x),  (II.8) 

j&n 

where 

4{x)  A  exp (plHx))  =  exp (P[P(x)  -  MO])  £  (0_  1}>  (IL9) 

X]exP (pfk(x))  J^exp (p[fk{x)  -ipn{x)}) 

k£fl  k£VL 


and  Xljgo  Pp(x)  =  1  for  a H  x  *= 
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(iv)  If  the  functions  P(-),  j  G  12 ,  are  twice  continuously  differentiable,  then  ifpn(-) 
is  twice  continuously  differentiable,  with  Hessian 


V2^pn(a;) 


for  all  x  G  Rd. 


jen  jen 


~P 


X  4(x)Vf(x) 
.jen 


x) 


j‘en 


(11.10) 


□ 


We  define  a  continuous,  nonpositive  optimality  function  9q  :  Rd  — >  R  for  all 
x  G  Rd  by 

9n(x)  =  -  min  l  ^  ^ (ifQ(x)  -  P(x))  +  § 

where  Eq  =  {/i  G  |  p  >  0  Vj  G  Q,  )>Ten  P  =  !}•  The  following  optimality 
condition  for  (FMX^)  is  expressed  in  terms  of  0q(-).;  see  Theorems  2.1.1,  2.1.3,  and 
2.1.6  of  Polak  (1997). 

Proposition  II. 2.  Suppose  that  the  functions  P(-),j  G  Q,  are  continuously  dif¬ 
ferentiable  and  that  12  C  Q.  If  x*  G  Rd  is  a  local  minimizer  for  (FMXq),  then 
9n(x*)  =  0.  □ 

At  stationary  points  of  (FMXpsj),  the  continuous,  nonpositive  optimality  func¬ 
tion  9pq  :  Md  — y  M  defined  by  9pq(x)  =  —  ||| V^pn(a;)||2  for  all  x  G  Md,  vanishes  to 
zero. 


x\ 


?en 


(11.11) 


C.  RATE  OF  CONVERGENCE  AND  COMPLEXITY 

This  section  examines  the  following  basic  smoothing  algorithm,  for  which  we 
develop  a  series  of  complexity  and  rate-of-convergence  results.  We  use  this  simple 
algorithm  to  gain  some  fundamental  insights  on  smoothing  algorithms,  but  yet  main¬ 
tain  tractability  of  the  analysis.  When  they  exist,  we  denote  optimal  solutions  of 
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(FMX)  and  (FMXP)  by  x*  and  x*,  respectively,  and  the  corresponding  optimal  val¬ 
ues  by  ip*  and  ip*.  The  algorithm  applies  the  Armijo  Gradient  Method  to  (FMXP), 
starting  at  an  initial  point  xq.  The  value  of  p  is  fixed  at  p*  and  it  guarantees  that 
Proposition  II. 3  stated  below  holds.  The  Armijo  Gradient  Method  uses  the  steep¬ 
est  descent  search  direction  and  the  Armijo  stepsize  rule  to  solve  an  unconstrained 
problem;  see  for  example  Algorithm  1.3.3  of  Polak  (1997). 

Algorithm  II.  1.  Smoothing  Armijo  Gradient  Algorithm 
Data:  Error  tolerance  t  >  0,  Xo  G  Md. 

Parameter:  5  G  (0,1). 

Step  1.  Set  p*  =  (logg)/((l  —  S)t). 

Step  2.  Generate  a  sequence  {ay}°h0  by  applying  Armijo  Gradient  Method  to 
(FMXP*).  □ 

In  this  dissertation,  we  have  several  algorithms  (including  Algorithm  II.  1)  with 
no  termination  criteria  stated  in  the  algorithm  procedure.  In  general  for  nonlinear 
programming,  there  are  often  more  than  one  possible  termination  criterion  for  each 
algorithm.  For  example,  a  possible  termination  criterion  for  unconstrained  nonlinear 
optimization  is  the  norm  of  the  search  direction  falls  below  a  certain  small  number. 
Determining  an  appropriate  criterion  is  often  application  dependent.  In  all  our  nu¬ 
merical  studies,  we  terminate  the  algorithms  when  (i)  the  current  iterate  falls  within 
a  certain  error  tolerance  of  the  optimal  solution  or  objective  function  value,  or  (ii)  the 
solution  satisfies  the  default  tolerances  of  the  solver  used.  We  state  the  termination 
criterion  in  the  numerical  section  of  each  chapter. 

Algorithm  II.  1  has  the  following  property. 

Proposition  II. 3.  Suppose  that  q  >  2  and  Step  2  of  Algorithm  II.  1  has  generated  a 
point  Xi  G  such  that  ipp*(xi)  —  ip**  <  5t.  Then  ip(xi)  —  if*  <  t. 

Proof.  By  the  optimality  of  ip**  and  (II. 7),  ip*,  <  ipp*(x*)  <  if*  +  (log q)/p*.  Thus, 
—ip*  <  —ip**  +  (log q)/p*.  Based  on  (II. 7),  'ip(xi)  <  ipp*{xi)  and  hence,  'ip(xi)  —  ip*  < 
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'lPp*(xi)~  ip*  *  + (log  q)/p*.  Since  'ipp*(xi)—,ip**  <  St  andp*  is  as  in  Step  1,  the  conclusion 
follows.  □ 

In  the  proposition  above,  the  number  of  functions  considered  has  been  con¬ 
strained  to  be  two  or  more  ( q  >  2)  as  it  is  not  meaningful  to  take  the  pointwise 
maximum  of  a  single  function.  For  a  fixed  p  >  0,  the  rate  of  convergence  of  the 
Armijo  Gradient  Method  as  applied  to  (FMXP)  is  well  known  (see  for  example  p.  60 
of  Polak,  1997).  However,  the  value  of  the  precision  parameter  p*  in  Algorithm  II. 1  is 
dictated  by  q  and  t  (see  Step  1),  which  complicates  the  analysis.  For  large  values  of  q 
or  small  values  of  t,  p*  is  large  and  hence  (FMXP*)  may  be  ill-conditioned  as  observed 
empirically  (Polak  et  ah,  2003).  In  this  chapter,  we  quantify  the  ill-conditioning  of 
(FMXp)  as  a  function  of  p  and  obtain  complexity  and  rate  of  convergence  results  for 
Algorithm  II.  1. 

1.  Ill-Conditioning  of  Smoothed  Problem 

The  following  strong  convexity  assumption  is  a  standard  assumption  required 
for  complexity  and  rate  of  convergence  analyses. 

Assumption  II. 4.  The  functions  fJ  (•),.)  G  N,  are 

(i)  twice  continuously  differentiable  and 

(ii)  there  exists  an  m  >  0  such  that 

m\\y\\2  <  (y,V2fj(x)y),  (11.12) 

for  all  x,  y  G  Rd,  and  j  G  N.  □ 

Lemma  II. 5.  Suppose  that  Assumption  II. 4  holds.  Then  for  any  x,y  G  R.d,  g  G  N, 
and  p  >  0, 

m\\y\\2  <  ( y ,  V2ifp(x)y)  ,  (11.13) 

with  m  as  in  Assumption  II. 4- 
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Proof.  From  (II.  10)  and  (11.12),  we  obtain  that 


(yy2Mx)v)  =  (y,V2P(x)y) +p^2fiJp(x)  (y,VfJ(x)VP(x)Ty) 


p{  y, 


X) 


Lie  Q 


E#)Vt ' 


x) 


b'eQ 


(y,V2 f3 (x)y)  +  p^2  yJp(x)  (y,VP(x)Y 


j&Q 


-  p{y , 


J6Q 


x) 


Lie  Q 


b'eQ 


>  HM|2  +P^^(X)  (y^fJ{x))2  -p(  y, 
j&Q  \ 

Hence,  we  only  need  to  show  that  the  difference  of  the  last  two  terms  is  nonnegative. 
Let  g  :  Wl  — >  M  be  the  convex  function  defined  as  g(z )  =  ( y,z )2  for  y,z  G  Md.  It 
follows  from  Jensen’s  inequality  (see  for  example  p.  6  of  Urruty  &  Baptiste,  1996) 
that 


Edwhvcw)  >g  £ 


X) 


(11.14) 


jeQ 


\j£Q 


Since  p  >  0,  the  result  follows.  □ 

For  any  matrix  A  G  Mmxn,  we  adopt  the  matrix  norm  ||A||  =  max||u||=i  ||Aw||, 
where  u  G  W1.  Under  Assumption  II.4(i),  \p(x)\,  ||V/J"(x)||,  and  \\'V2fj(x)\\  are 
bounded  on  bounded  subsets  of  R.d  for  given  j  G  N. 

Assumption  II. 6.  For  any  bounded  set  S  C  M.d,  there  exists  a  K  G  (0,  oo)  such  that 
max{ | /J (x) | ,  || Vp (a;) ||,  \\V2fj(x)\\}  <  K  for  all  xeS,jeN.  □ 


The  assumption  above  holds  for  example  under  standard  assumptions  when 
/■?(•),  j  G  N,  arise  from  discretization  of  semi-inhnite  max  functions.  Under  this 
assumption,  we  obtain  the  following  useful  result. 


Lemma  II. 7.  Suppose  that  Assumptions  II-4(i)  and  II.  6  hold.  Then  for  every  bounded 
set  S  C  Rd, 

(y,V%(x)y)  <pL\\y\\2  (11.15) 
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for  all  x  G  S,y  G  R.d,  q  G  N,  and  p  >  1,  where  L  =  K+2K2,  with  K  as  in  Assumption 
II.  6. 


Proof.  From  the  theory  of  matrix  algebra,  for  matrices  A  G  M.mxn,B  G  Mnxr,  and 
vector  x  G  Mn,  we  have  that  ||Ac||  <  \\A\\ ||a;||,  \\AB\\  <  ||A||||.B||,  and  ||axrT||  =  ||a:||2 
(see  for  example  p.  26  of  Gill,  Murray,  &  Wright,  1991).  We  consider  each  of  the 
three  terms  of  V2^p(-);  see  (II. 10).  Recall  that  YhjeQ  hp(x)  =  1  f°r  x  e  gGN, 
and  p  >  0.  For  any  x  G  S,  y  G  and  gGh,  under  Assumption  II. 6,  we  obtain  for 
the  first  term  that 


<  IMI 


^2^p(x)V2f(x)  )  y 

vjeQ 


< 


iMi2Xdwilv2/3'Mll  <tfiw2, 


(11.16) 


where  AT  is  the  constant  in  Assumption  II. 6  corresponding  to  S.  Next,  for  the  second 
term  of  V2^p(-), 


y,J2ri(x)Vfj(x)Vfj(x)Ty)  <  \\y\\ 


AQ 


Y,4(x)VP(x)Vf’  I 


X) 


j&Q 


<  ll!/ll2  Edwllv/'WV/'W 

<  K2h\\2- 


(11.17) 


For  the  third  term,  we  obtain  that 


JeQ 


n  T 


E 

.jeQ 


hp 


»V/2 


,1 


y 


- 

- 

T 

<  IMI2 

JeQ 

JeQ 

<K2\\yf 


(11.18) 


Hence,  for  all  a;  G  S',  y  G  Md,  q  G  N  and  p  >  1,  (y1'V2'ij>p(x)y)  <  (K  +  pA'2  + 
p7f2)IM|2<p(77  +  2X2)|M|2.  □ 


Lemma  II. 7  enables  us  to  quantify  the  rate  of  convergence  of  the  Armijo  Gra¬ 
dient  Method  for  (FMXP),  as  a  function  of  p  >  1,  which  we  consider  next. 
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Proposition  II. 8.  Suppose  that  Assumptions  II. 4  and  II. 6  hold.  For  any  bounded 
set  S  C  Rd,  there  exists  a  k  G  (0, 1)  such  that  the  rate  of  convergence  of  the  Armijo 
Gradient  Method  to  solve  (FMXP),  initialized  by  xo  G  S,  is  linear  with  coefficient 
1  —  k/p  for  any  p  >  1  and  q  G  N.  That  is,  for  all  sequences  {xj}^0  C  generated 
by  the  Armijo  Gradient  Method  when  applied  to  (FMXp).  for  any  p  >  1,  q  G  N,  and 
xq  G  S,  we  have  that 


i)  -  % 


k 

<  1 - for  all  i  G  N0. 

P 


(11.19) 


-  ip* 

Proof.  It  follows  by  Lemma  II. 5  and  Assumption  II. 6,  and  the  fact  that  Xq  G  S, 
that  there  exists  a  bounded  set  S'  C  such  that  all  sequences  generated  by  Armijo 
Gradient  Method  on  (FMXp),  initialized  by  x0  G  S,  are  contained  in  S'  for  all  p  >  1, 
q  G  N,  Xq  G  S.  Let  m  be  as  in  Assumption  II. 4  and  K  be  the  constant  in  Assumption 
II. 6  corresponding  to  S'.  In  view  of  Lemmas  II.5  and  II. T, 


m\\yf  <  (y,  V2ipp(x)y)  <  pL\\y\\2,  (11.20) 

for  all  x  G  S' ,  y  G  R,  q  G  N,  and  p  >  1,  where  L  —  K  +  2K2.  Hence,  we  deduce 
from  Theorem  1.3.7  of  Polak  (1997)  that  the  rate  of  convergence  for  Armijo  Gradient 
Method  to  solve  (FMXP)  is  linear  with  coefficient  1  —  Amf3a(l  —  a)/(pL )  G  (0, 1)  for 
all  p  >  1,  q  G  N,  x0  G  S,  where  a,  (3  G  (0, 1)  are  the  Armijo  line  search  parameters. 
Hence, 

k  =  Am/3a(l  —  a)  /  L,  (11.21) 

which  is  less  than  unity  because  a(l  —  a)  G  (0, 1/4]  and  m  <  L  in  view  of  (11.20).  □ 


2.  Complexity 

The  results  above  enable  us  to  identify  the  complexity  of  Algorithm  II.  1  un¬ 
der  the  following  assumption  on  the  computational  work  required  for  function  and 
gradient  evaluations.  We  let  t0  =  ip(xo)  —  ip*  for  a  given  x0  G  and  q  G  N. 

Assumption  II. 9.  There  exist  constants  a,b  G  (0,  oo)  such  that  for  any  d  G  N, 
j  G  N,  and  x  G  R.d,  the  computational  work  to  evaluate  either  f\x)  or  V/J(x)  is  no 
larger  than  adb .  □ 
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Assumption  II. 9  holds  for  all  problem  instances  considered  in  this  chapter  (see 
Appendix  A)  and  appears  reasonable  for  many  practical  situations.  The  following 
result  can  easily  be  modified  to  account  for  other  assumption  about  work  per  function 
and  gradient  evaluation. 


Theorem  II. 10.  Suppose  that  Assumptions  II. f,  II-  6,  and  II.  9  hold,  and  that  Algo¬ 
rithm  II.l  terminates  after  n  iterations  with  if( xn )  —  if*  <  t.  Then  for  any  d  G  N  and 
bounded  set  S  C  there  exist  constants  c,  c',  f  G  (0,  oo)  such  that  the  computational 
work  until  termination  for  Algorithm  II.l  is  no  larger  than 

for  all  q  G  N,  q  >  2,  x0  G  S,  <5  €  (0, 1),  and  t  G  (0,  t'] . 

Proof.  Let  q  >  2  and  t  G  (0,  log  q] ,  which  ensures  that  p*  =  (logg)/[(l  —  5)t]  >  1. 
Thus,  Proposition  II. 8  applies  and  the  number  of  iterations  of  the  Armijo  Gradient 
Method  to  generate  {ayjjL 0  such  that  ifp*(xn)  —  if**  <  5t  is  no  larger  than 


lQg§ 


(11.23) 


Mi  -  £) 

where  k  is  the  constant  in  Proposition  II. 8  corresponding  to  S  and  [•]  denotes  the 
ceiling  operator.  In  view  of  Proposition  II. 3,  xn  also  satisfies  if(xn)  —  if*  <  t.  Since 
the  main  computational  work  in  each  iteration  for  the  Armijo  Gradient  Method  is  to 
determine  Vifp*(xi),  it  follows  by  Assumption  II. 9  that  there  exist  a,  b  <  oo  such  that 
the  computational  work  in  each  iteration  of  the  Armijo  Gradient  Method  when  applied 
to  (FMXp»)  is  no  larger  than  a.qdb .  Thus,  the  computational  work  in  Algorithm  II.l 
to  termination  at  xn  is  no  larger  than  (11.23)  multiplied  by  a.qdh.  Let  f1*  denote  the 
minimum  value  of  /1(-);  which  is  finite  according  to  Assumption  II. 4.  Let  K  be  the 
constant  in  Assumption  II. 6  corresponding  to  S.  We  then  find  that  to  =  f/H^o)  —if*< 
K  —  f1*  =  d ,  for  any  Xq  £  S  and  q  G  N.  It  follows  that  the  computational  work  in 
Algorithm  II.l  to  termination  at  xn  is  no  larger  than 

log^ 


aqd 


Mi  -  £) 


(11.24) 
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for  any  g  G  N,  q  >  2,  x0  G  S,  §  G  (0, 1),  and  t  G  (0,  logg].  Since  logo;  <  x  —  1  for 
a;  G  (0,1],  it  follows  by  the  choice  of  p*  that  the  computational  work  in  Algorithm 
II.  1  to  termination  at  xn  is  no  larger  than 


aqdb 


log 


st 

d_ _ 

k(l— S)t  \ 
log  Q  ) 


<  aqdb 

"logs" 

k(l—5)t 

log  q 

(11.25) 


for  all  q  G  N,  q  >  2,  x0  G  S,  §  G  (0, 1),  and  t  G  (0,  min  (log  g,  c'}]. 

There  exists  a  t'  G  (0,  minjlog  g,  c'}\  such  that  >  \  for  all  t  G  (0,  P], 

g  G  N,g  >  2,  and  5  G  (0,1).  This  then  implies  that  for  all  g  G  N,  g  >  2,  x0  G  S, 
6  G  (0, 1),  and  t  G  (0,  t'] , 


aqdb 


log^logfj 

k(l-6)t 


<  2 aqdb 


log  q  log  A 
k(l-S)t  J 


2 adb  f  q  log  q  log  ^  \ 

k  \  {l-5)t  J 


(11.26) 


Since  k  (see  (11.21))  only  depends  on  m  from  Assumption  II. 4,  K  from  Assumption 
II. 6,  and  user-defined  parameters,  the  conclusion  follows.  □ 

We  deduce  from  Theorem  II.  10  and  its  proof  that  the  number  of  iterations  of 
Algorithm  II.  1  required  to  achieve  a  solution  with  value  within  t  of  the  optimal  value 
of  (FMX)  is  0((l/t)  log  1/t)  for  fixed  q  >  2,  d  G  N,  and  <5  G  (0, 1).  This  is  worse  than 
for  example  the  Pshenichnyi-Pironneau-Polak  (PPP)  min-max  algorithm  (Algorithm 
2.4.1  in  Polak,  1997)  and  the  modified  conjugate  gradient  method  on  pp.  282-283  of 
Nemirovski  and  Yudin  (1983),  ,  which  achieves  0(logl/t).  The  SQP  algorithm  in 
Zhou  and  Tits  (1996)  may  also  require  a  low  number  of  iterations  as  it  converges 
superlinearly,  but  its  complexity  in  t  is  unknown.  The  larger  number  of  iterations  for 
Algorithm  II.  1  is  caused  by  the  fact  that  the  Armijo  Gradient  Method  exhibits  slower 
rate  of  convergence  as  p  increases  (see  Proposition  II. 8)  and  a  larger  p  is  required  in 
Algorithm  II.  1  for  a  smaller  t. 

We  next  discuss  the  complexity  of  smoothing  algorithms  as  compared  to  the 
SQP  algorithms.  We  consider  a  sequence  of  finite  minimax  problems  with  the  same 
number  of  variables  d,  but  with  an  increasing  number  of  functions  q.  This  occurs 
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for  example,  in  the  solution  of  semi-infinite  minimax  problems  using  discretization 
algorithms,  which  we  discuss  in  Chapter  III. 

When  we  also  include  the  work  per  iteration  of  Algorithm  II. 1,  we  see  from 
Theorem  11.10  that  for  fixed  t  G  (0,  P],  d  G  N,  and  5  G  (0,1),  the  complexity  is 
0(q\ogq).  For  comparison,  the  complexity  of  SQP  and  PPP  algorithms  to  achieve  a 
near-optimal  solution  of  (FMX)  is  larger  as  we  see  next. 

The  main  computational  work  in  an  iteration  of  an  SQP  algorithm  involves 
solving  a  convex  QP  with  d  +  1  variables  and  q  inequality  constraints  (Zhou  &  Tits, 
1996).  Introducing  slack  variables  to  convert  into  standard  form,  this  subproblem 
becomes  a  convex  QP  with  d  +  1  +  q  variables  and  q  equality  constraints.  Based 
on  Monteiro  and  Adler  (1989),  the  computational  work  to  solve  the  converted  QP  is 
0((d  +  1  +  q)3)-  Assuming  that  the  number  of  iterations  an  SQP  algorithm  needs  to 
achieve  a  near-optimal  solution  of  (FMX)  is  0(1),  for  fixed  t  G  (0,  P]  and  d  G  N,  the 
complexity  of  an  SQP  algorithm  to  achieve  a  near-optimal  solution  of  (FMX)  is  no 
better  than  0(q3).  The  same  result  holds  for  the  PPP  algorithm.  This  complexity, 
when  compared  with  O(qlogq)  of  Algorithm  II.  1,  indicates  that  smoothing  algorithms 
may  be  more  efficient  than  SQP  and  PPP  algorithms  for  (FMX)  with  large  q.  We 
carry  out  a  comprehensive  numerical  comparison  of  smoothing  algorithms  with  SQP 
and  PPP  algorithms  in  Section  II. E.  We  note  that  the  modified  conjugate  gradient 
method  on  pp.  282-283  of  Nemirovski  and  Yudin  (1983),  may  also  have  a  low  com¬ 
plexity  in  q,  but  this  depends  on  its  implementation  and  the  method  is  only  applicable 
to  convex  problems. 

3.  Optimal  Parameter  Choice 

We  see  from  Theorem  II.  10  that  the  computational  work  in  Algorithm  II.  1 
depends  on  the  algorithm  parameter  6.  In  this  subsection,  we  End  an  “optimal” 
choice  of  5.  A  direct  minimization  of  (11.22)  with  respect  to  6  appears  difficult  and 
thus,  we  carry  out  a  rate  analysis  and  determine  an  optimal  6  in  that  context. 

The  notation  t  j,  0  means  t  approaches  zero  from  above.  We  first  consider  the 
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situation  as  t  f  0  and  let  St  G  (0, 1)  be  a  choice  of  S  in  Algorithm  II.  1  for  a  specific 
t.  For  fixed  d  G  N,  q  G  N,  q  >  2,  S  C  Md,  and  rro  G  S,  let  c  and  c'  be  as  in  Theorem 
II.  10  and  let  wt  denote  (11.22)  viewed  as  a  function  of  t  >  0,  with  5  replaced  by  St. 


i.e., 


a  _  log  ^ 

Wt  =  c- 


•(!-*)*  (II'27) 
with  c  =  cq  log  q  for  all  t  >  0.  The  next  result  shows  that  the  choice  of  G  (0, 1)  |  t  > 

0}  influences  the  rate  with  which  wt  — >  00,  as  1 f  0.  However,  any  constant  5t  for  all 

t  >  0  results  in  the  slowest  possible  rate  of  increase  in  wt,  an  asymptotic  rate  of  1/t, 

as  t  i  0. 


Theorem  II. 11.  For  any  {5*  G  (0, 1)  |  t  >  0}, 

logwt 

hmsup - <  —  1. 

40  log  t 


If  St  —  a  G  (0, 1)  for  all  t  >  0,  then 


log  Wt  , 
inn  — -  =  —  1 . 


(11.28) 


(11.29) 


40  log  t 

Proof.  There  exists  a  tx  G  (0,  cxd)  such  that  log  ^  >  1  for  all  t  G  (0,  tf  and  any 
{5t  G  (0, 1)  |  t  >  0}.  Hence,  for  any  t  G  (0,  min{l,  H})  and  8t  G  (0, 1), 
log  wt 


log  t 


log  c  log  log  log(l  -  St)  log  t 


< 


log  t 
logc 
log  t, 


log  t 


log  t 


log  t 


(11.30) 


-  1, 


and  the  first  part  follows.  Taking  limits  in  (11.30),  with  5t  =  a,  yields  the  second 
part.  □ 

We  next  consider  the  situation  as  q  — »  00  and,  similar  to  above,  let  Sq  G  (0, 1) 
be  a  choice  of  S  in  Algorithm  II.  1  for  a  specific  q  G  N.  For  fixed  d  G  N  and  S  C 
let  c  and  d  be  as  in  Theorem  II.  10.  There  exists  a  H  G  (0,  00)  such  that  log(c/t)  >  0 
and  log (d ft)  >  1  for  all  t  G  (0,  ti] .  For  any  given  q  G  N,  q  >  2  and  t  G  (0,  H],  let  wq 
denote  (11.22)  viewed  as  a  function  of  q,  with  5  replaced  by  5q,  i.e., 


A  fc\  q\ogq\og^t 
(1-^)  ' 


W«  =  \t 


(11.31) 
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The  next  result  shows  that  the  choice  of  {Sq}^=2  influences  the  rate  with  which 
wq  — *  oo,  as  q  — *  oo.  However,  for  sufficiently  small  tolerance  t  >  0,  as  above,  any 
constant  choice  of  5q  for  all  g  6  N  results  in  the  slowest  possible  rate  of  increase  in 
wq ,  as  q  — »  oo.  Hence,  any  constant  5  G  (0,1)  in  Algorithm  11.1  is  optimal  in  this 
sense  and  results  in  the  asymptotic  rate  of  g,  as  q  — *  oo. 


Theorem  11.12.  For  any  sequence  of  {5q}fL3,  with  5q  G  (0,1)  ,  we  have  that 


wq  >  x 

logg 


(11.32) 


for  all  q  G  N,  q  >  3,  and  t  G  (0,  ti] .  If  5q  =  a,  where  a  G  (0, 1)  is  a  constant,  then 

log  wq 


lim 

<?-> oo  logg 


=  1. 


(11.33) 


Proof.  For  q  >  3, 

logw9  =  logf  +  logg  +  log  log  g  +  loglogfc  _  log(l  -  Sq)  34 

log  q  log  q  log  q  log  q  log  q  log  q 

log  f  log  log  f-t 

—  ] - 1"  1  d - T - —■ 

log  q  log  q 


Since  wq  is  dehned  only  for  t  G  (0,H],  and  log(c/f)  >  0  and  log(c'/t)  >  1  for  all 
t  G  (0, H],  it  follows  that  (log  wq)/ log q  >  1  for  all  q  >  3,  t  G  (0,fi],  and  {^g}“3.  The 
proof  for  the  second  part  follows  from  taking  the  limit  in  (11.34).  □ 


4.  Rate  of  Convergence 

The  previous  subsection  considers  the  effect  of  the  algorithm  parameter  <5 
on  the  computational  work  required  in  Algorithm  II.  1.  This  parameter  defines  the 
precision  parameter  through  the  relationship  p*  =  (logg)/((l  —  <5)f);  see  Step  1  of 
Algorithm  II. 1.  In  this  subsection,  we  do  not  restrict  Algorithm  II. 1  to  this  class  of 
choices  for  p*  and  consider  any  positive  value  of  the  precision  parameter.  In  particular, 
we  examine  the  progress  made  by  Algorithm  II.  1  after  n  iterations  for  different  choices 
of  p*.  Since  the  choice  may  depend  on  n,  we  denote  by  pn  the  precision  parameter 
used  in  Algorithm  II.  1  when  terminated  after  n  iterations.  We  examine  the  rate  of 
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decay  of  an  error  bound  on  ip( xn )  —  ip*,  and  also  determine  the  “optimal  choice”  of 
pn  that  produces  the  fastest  rate  of  decay  of  the  error  bound  as  n  — >  oo. 

Suppose  that  Assumptions  II. 4  and  II. 6  hold.  For  a  given  bounded  set  S  C 
let  k  be  as  in  Proposition  II. 8  and  let  {xj}”=0,  with  x0  G  S',  be  a  sequence  generated  by 
Algorithm  II. 1  using  p*  =  pn  for  some  pn  >  0.  Then  in  view  of  (II. 7)  and  Proposition 
II. 8, 


ip(xn)  -  ip* 


< 

< 

< 


IPpnM  ~  Ip pn  + 


log q 

Pn 


{1>{x o)  -  V)  + 


21ogg 

Pn 


(11.35) 


We  want  to  determine  the  “best”  {pn}()<L1  such  that  the  error  bound  on  tp(xn)  —  'ip* 
dehned  by  the  right-hand  side  of  (11.35)  decays  as  fast  as  possible  as  n  — >  oo.  We 
denote  that  error  bound  by  en,  i.e.,  for  any  n  e  N, 


f  1  _  k_Y  +  21ogg 
V  Pn)  Pn 


(11.36) 


We  need  the  following  trivial  technical  result. 


Lemma  11.13.  For  x  G  [0, 1/2],  — 2x  <  log(l  —  x)  <  —x.  □ 

We  next  obtain  that  en  asymptotically  decays  with  a  rate  no  faster  than  1  /n, 
as  n  — >  oo,  regardless  of  the  choice  of  pn ,  and  that  rate  is  attained  with  a  particular 
choice  of  pn. 


Theorem  11.14.  The  following  statements  hold  for  en  in  (11.36): 

(i)  For  any  {pn}^L1;  with  pn  >  1  for  all  n  G  N,  liminf^oo  logen/  log n  >  —1. 

(ii)  If  pn  =  C n/  log n  for  all  n  G  N,  with  (  G  (0,  k],  then  lim^oo  logen/  log  n  =  —1. 

(in)  If  pn  =  n}~v /  log  n  for  all  n  G  N,  with  v  G  (0,1);  then  lim^oo  log  en/  log  n  = 
-1  +  is. 
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Proof.  For  any  n  G  N,  we  see  from  (11.36)  that 

(  k 

log  t0  +  n  log  (  1 - 

V  Pn 


log  en  =  log  |^exp 

>  log  ^rriax  |  exp 

=  max  | log  ^exp 
Hence,  for  any  n  G  N,  n  >  1, 
loge 


2  logg 
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log  t0  +  n  log  ( 1 
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log  t0  +  n  log  (  1 
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Pn)  . 
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k  \] 

),log- 

1  ) 

Pn)  \ 

21ogg 

Pn 


log  n 


>  max 


log  t0  n  log  (l  Pn)  log pn  log  2  log  log  q 

1  '  1  )  1  I  1  I  1 


log  n 


log  n 


log  n  log  n  log  n 


(11.37) 


Let  e  >  0.  Then  there  exists  a  n0  G  N  such  that  (log  log  q)/ log  n  >  — e  for  all  n  >  n0. 
If  (logpn)/logn  <  1  and  n  >  max{2,n0},  then 


log  er 


> 


log  pn  log  2  log  log  q  > 


log  Pr 


e  >  -1 


e. 


,  ,  ,  ,  ,  •  (H.38) 

log  n  log  n  log  n  log  n  log  n 

Alternatively,  suppose  that  (log/jn) /  log  n  >  1.  Hence,  n/pn  <  1,  and  if  n  >  2k,  then 
k/pn  G  (0, 1/2].  Based  on  Lemma  11.13  and  (11.37), 

k 


log  en  >  log  t0  +  nlog  i,1  Pn 


>  log^o  +  n  l  Pn 


> 


log  t0  2k 


(11.39) 

logn  —  log  n  '  log  n  ~  log  n  '  log  n  ~  log  n  log  n 
for  all  n  >  2k  such  that  (log pn)/ log  n  >  1.  Thus,  there  exists  a  rti  >  max{n0,  2k} 

such  that 

log  t0 


2k 


>  -1-e 


(11.40) 


log  n  log  n 

for  all  n  >  n \ .  Hence,  for  all  n  >  rii,  (log  en)/ log  n  >  — 1  —  e.  Since  e  is  chosen 
arbitrarily,  the  first  part  follows.  Next,  we  prove  the  second  part  of  the  theorem. 
From  (11.36),  with  pn  =  Qn /  logn,  where  (  G  (0,  k], 


log  en  =  log  exp 


log  t0  +  n  log  (  1 


k  log  n 

C n 


+ 


2  log  q  log  n 

C  n 


(11.41) 


There  exists  a  ri2  G  N  such  that  {k\ogn)/Qn  G  [0,1/2]  for  all  n  >  ri2 ■  Thus,  by 
Lemma  11.13, 


log  j^exp 
<  log  f  exp 


log  to  +  n 
log  t0  +  n 


2k  log  n 

C n 

k  log  n 

C n 


+ 


2  log  q  log  n 

C  n 

2  log  q  log  n 

C  n 


<  log  er 


(11.42) 
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for  all  n  >  n 2.  We  first  consider  the  lower  bound  in  (11.42), 


log  (  exp 


log  t0  +  n 


2  k  log  n 

C  n 


+ 


2  log  <7  log  nN 


C n 


=  log 


=  log 


2  log  q  log  n 
(n 

'  2  log  q  log  n 
(n 


exp  (log  t0  +  log  n  2k^) 


2  log  q  log  n 

C  n 


+  1 


+  log 


to(n 


1-- 


2  log  q  log  n 


+  1 


(11.43) 


Since  (  G  (0,  k],  and  by  continuity  of  the  log(-)  function, 


lim  log 

n— >■  00 


to(n 


^ _ £/£ 


2  log  q  log  n 


0. 


Continuing  from  (11.43),  and  using  (11.44),  we  obtain  that 


log  (2M5E2)  +  log  +  l) 

lim  - - - 

n—> 00  log  n 

log  2  +  log  log  q  +  log  log  n  —  log  (  —  log  n 

Inn  - - - 

ih  00  log  n 


(11.44) 


(11.45) 


Similar  arguments  yield  that  the  upper  bound  in  (11.42)  also  tends  to  —1,  as  n  — >  00. 
Hence,  the  second  conclusion  follows.  The  third  part  of  the  theorem  follows  by  similar 
arguments.  □ 

We  see  from  Theorem  11.14  that  the  “best”  choice  of  pn  is  pn  =  ( n/ log  in , 
with  £  G  (0,  k\,  and  that  choice  results  in  an  asymptotic  rate  of  decay  of  error  bound 
of  1/n.  The  constant  k  may  be  unknown  as  it  depends  on  m  of  Assumption  II. 4 
and  K  of  Assumption  II. 6;  see  (11.21).  Consequently,  pn  =  Qri /  log  n  may  be  difficult 
to  implement.  Theorem  11.14  shows  that  the  choice  pn  =  n1~u /  logn  with  a  small 
v  G  (0, 1)  is  almost  as  good  (it  results  in  asymptotic  rate  1/n 1~l/  instead  of  rate  1/n) 
and  is  independent  of  k. 

Roughly  speaking,  a  rate  of  decay  of  error  bound  of  no  better  than  1/n  in¬ 
dicated  by  Theorem  11.14  means  that  the  required  number  of  iterations  to  achieve 
an  error  tolerance  t  increases  at  least  at  rate  1/t  as  t  approaches  zero.  In  view  of 
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Theorem  II. 11,  the  rate  Ijt  is  attained  with  the  precision  parameter  choice  in  Step 
1  of  Algorithm  II.  1.  Hence,  the  choice  in  Step  1  of  Algorithm  II.  1  for  the  precision 
parameter  cannot  be  improved. 

Theorems  II.  11  and  11.14  indicate  that  Algorithm  II.  1  may  only  converge  sub- 
linearly.  In  contrast,  Theorem  II. 10  shows  that  smoothing  algorithms  may  still  be 
capable  of  yielding  competitive  run  times  against  other  algorithms  when  q  is  large  due 
to  low  computational  work  per  iteration.  For  smoothing  algorithms  to  be  competitive 
in  empirical  test,  however,  we  need  to  go  beyond  the  basic  Algorithm  II.  1  and  develop 
more  sophisticated,  adaptive  precision-adjustment  schemes  as  discussed  next. 

D.  SMOOTHING  ALGORITHMS  AND  ADAPTIVE  PRE¬ 
CISION  ADJUSTMENT 

The  previous  section  shows  that  the  choice  of  precision  parameter  influences 
the  rate  of  convergence,  since  the  degree  of  ill-conditioning  in  (FMXp)  depends  on 
the  precision  parameter.  This  section  presents  two  smoothing  algorithms  with  novel 
precision-adjustment  schemes  for  (FMX).  The  results  in  Polak  et  al.  (2003)  and 
onr  preliminary  numerical  tests  strongly  indicate  that  adaptive  precision-adjustment 
schemes  are  superior  to  static  and  open-loop  schemes  in  their  ability  to  avoid  ill- 
conditioning.  Thus,  we  focus  on  adaptive  precision- adjustment  schemes  in  onr  smooth¬ 
ing  algorithms. 

The  first  algorithm,  Algorithm  II. 2  follows  Algorithm  3.2  in  Polak  et  al.  (2008), 
but  uses  a  much  simpler  scheme  for  precision  adjustment.  The  second  algorithm, 
Algorithm  II. 3,  adopts  a  novel  line-search  rule  that  aims  to  ensure  descent  in  ■?/?(•) 
and,  if  that  is  not  possible,  increases  the  precision  parameter.  Previous  smoothing 
algorithms  (Polak  et  al.,  2003,  2008)  do  not  check  for  descent  in  ^(-).  The  new 
algorithms  implement  active-set  strategies  adapted  from  Polak  et  al.  (2008). 

We  use  the  following  notation.  The  e-active  set,  e  >  0,  is  denoted  by 

Qe(x)  =  {j  G  Q\iJ>{x)  -  f(x)  <  e}.  (11.46) 
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As  in  Algorithm  3.2  of  Polak  et  al.  (2008),  we  compute  a  search  direction  using  a 
d  x  d  matrix  Bpq[x).  We  consider  two  options.  When 

Bpn{x)  =  /,  (11.47) 

the  d  x  d  identity  matrix,  the  search  direction  is  equivalent  to  the  steepest  descent 
direction.  When 

Bpn(x)  =  r) psi(x)I  +  Hpn(x),  (11.48) 

the  search  direction  is  a  Quasi-Newton  direction,  where 

\jeci  Vjeo  /  Vjeo 

(11.49) 

VPn(x)  =  max{0,  ip  -  epn(x)},  (11.50) 

(p  >  0,  and  epo,(x)  is  the  smallest  eigenvalue  of  Hpq(x).  The  quantity  r]pn(x)  ensures 
that  Bpq(x )  is  positive  definite.  The  Quasi-Newton  direction  given  in  (II.48)-(II.50) 
is  adopted  from  Polak  et  al.  (2008).  Polak  et  al.  (2008)  observe  that  when  p  —>  oo, 
the  first  term  in  the  Hessian  function  (11.10)  becomes  negligible,  thus  they  ignore  the 
first  term. 

We  next  present  the  two  algorithms  and  proofs  for  their  convergence. 

1.  Smoothing  Algorithm  Based  on  Optimality  Func¬ 
tion 

We  first  consider  the  following  smoothing  algorithm,  with  a  simple  adaptive 
precision- adjustment  scheme. 

Algorithm  II. 2. 

Data:  Xo  E 

Parameters  and  Auxiliary  Functions:  a,/3  E  (0,1), p0  >  1  ,u  =  (10 log q)/po, 
function  Bpq(-)  as  in  (11.47)  or  (11.48),  e0  >  0, £  >  1, >  1,  cp  >  1. 
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Step  1.  Set  i  =  0,  j  =  0,  O0  =  Qeo(x0). 

Step  2.  Compute  the  search  direction  hPi£iXxi)  by  solving 


BPiQ.i(xi)hPiQi[xi)  —  (xj) . 

(II-51) 

Step  3. 

Compute  the  stepsize  A*  =  /3ki ,  where  ki  is  the  largest  integer  k  such  that 

fpPini(xi  +  /3'hPini{xi))  ippifiiixi)  <  a/3 ~|| [xp || 

(11.52) 

and 

fpPiQiixi  +  PkhPiQi{xi ))  -ip(xi  +  ) 3khPini(xi ))  >  -uj. 

(11.53) 

Step  4. 

Set 

xi+ 1  =  Xi  +  P^hp^Xi), 

(11.54) 

U  Qei(^i+ 1)- 

(11.55) 

Step  5.  Enter  Subroutine  II. 1,  and  go  to  Step  2  on  exit  from  Subroutine  II. 1. 

Subroutine  II. 1.  Adaptive  Precision- Parameter  Adjustment  using  Optimality  Func¬ 
tion 


If 

9Pini(xi+ 1)  >  -eh  (11.56) 

set  x*  =  Xi+ 1,  set  pi+\  =  £pi,  set  el+i  =  e;/A,  replace  i  by  i  +  1,  replace  j  by  j  +  1, 
and  exit  Subroutine  II.  1. 

Else,  set  pi+\  =  pi:  set  ei+1  =  e;,  replace  i  by  i  +  1,  and  exit  Subroutine  II. 1.  □ 

Steps  1  to  4  of  Algorithm  II. 2  are  adopted  from  Algorithm  3.2  of  Polak 
et  al.  (2008).  We  note  the  unusual  choice  of  the  right-hand  side  in  (11.52),  where 
~\\^Pi^i(xi)\\2  is  used  instead  of  the  conventional  ('V/ippiai(xi),hPini(xi)}.  Test  runs 
show  that  Algorithm  II. 2  with  —  ||^pioi(^i)||2  is  slightly  more  efficient  than  with  the 
conventional  ('Vp>Pini(xi),  h^n^Xi)).  To  allow  direct  comparison  with  Algorithm  3.2 
of  Polak  et  al.  (2008),  we  use  ■~\\hPiQi(xi)\\2  in  Algorithm  II. 2. 
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The  test  in  (11.53)  prevents  the  construction  of  a  point  xi+i  where 
is  much  greater  than  if(xi)  during  the  early  iterations  when  the  set  0*  is  small;  see 
Polak  et  al.  (2008). 

The  key  difference  between  Algorithm  II. 2  and  Algorithm  3.2  of  Polak  et  al. 
(2008)  is  the  simplified  scheme  to  adjust  pt  in  Subroutine  II.  1.  This  difference  calls 
for  a  different  proof  of  convergence  as  compared  to  Polak  et  al.  (2008),  and  will  be 
based  on  consistent  approximation;  see  Section  I.D.3.  Let  V  denote  an  increasing 
sequence  of  positive  real  numbers  that  approach  infinity. 

The  following  result  shows  that  the  pairs  ((FMXpn),  #Po('))  hi  the  sequence 
{((FMXpfi),  9pq(-))}p£'p  are  indeed  consistent  approximations  to  ((FMXn),  9n(-))-  This 
is  subsequently  used  in  the  proof  of  convergence  of  Algorithm  II. 2. 

Theorem  11.15.  Suppose  that  Assumption  II- 4  (i)  holds.  Then  for  any  h  C  N, 
the  pairs  ((FMXp(i),0psi(-))  in  the  sequence  {((FMMPpn),  9Pn(-))}p&v  are  consistent 
approximations  to  ((FMXq), 

Proof.  We  follow  the  proofs  of  Lemmas  4.3  and  4.4  in  Polak  (2003),  but  simplify 
the  arguments  as  Polak  (2003)  deals  with  min-max-min  problems.  According  to 
Theorem  3.3.2  of  Polak  (1997),  (FMXP;q)  epi-converges  to  (FMX^),  as  i  — >  oo  if 
and  only  if  (i)  for  any  x*  G  Md,  there  exists  a  sequence  with  Xi  G  M.d,  such 

that  Xi  — >  x*  and  pi  — *  oo,  as  i  — *  oo,  lirnsup^^  'ipPiQ(xi)  <  ipri(x*)  and  (ii)  for 
any  sequence  {xi}“0,  such  that  Xi  G  Xi  — *  x*  G  Md,  and  pi  — >  oo  as  i  — *  oo, 
lim  inf ipPin(xi)  >  i>n{x*). 

(i)  Let  x*  G  Md.  Construct  a  sequence  {xj}“0,  where  Xi  =  x*  for  all  i. 
Obviously,  Xi  — *  x*  as  i  — *  oo.  According  to  Proposition  II.  1  (ii) ,  'ifpn^x)  — >  ifci^x), 
as  p  — *  oo,  this  implies  that  lim  sup  'ifpni.x*)  =  lim  mfp^00'ij>pfl(x*)  =  ifa^x*). 
Therefore,  limsup^^  ipPin{xi)  =  limsupi^00'ifPiQ(x*)  =  i>n(x*). 

(ii)  Let  {xj})T0  and  {pi}°Z0  be  arbitrary  sequences  such  that  xt  — >  x* ,  x*  G 
Md,  and  pi  — *  oo,  as  i  — >  oo.  For  any  t  >  0,  there  exists  by  continuity  of  ifn(-) 
an  i0  such  that  ipn(xi)  —  ifo.(x*)  <  |  for  all  i  >  i0.  Moreover,  from  (II. 7),  there 
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exists  an  i\  such  that  ipPin(x)  —  'ipa(x)  <  <  |  for  all  i  >  i\.  Then  for  all 

i  >  max(i0,ii ),'ipPin(xi)  -  i><a{x*)  =  ipPin(xi)  -  i>u{xi)  +  ifn(xi)  -  ifn(x*)  <  t.  Hence, 
ifPin(xi)  ->■  i>n{x*). 

We  next  consider  the  optimality  functions.  Let  {xt }°h0  C  and  > 

0  for  all  i,  be  arbitrary  sequences  and  x*  E  R.d  be  such  that  aq  — *  x*  and  p%  — *  oo, 
as  i  — »  oo.  Since  pPp(x)  E  (0, 1)  for  any  j  E  12,  p  >  0,  and  x  E  Md,  {pPi(xi)}°S 0  is  a 
bounded  sequence  in  and,  according  to  the  Bolzano- Weierstrass  Theorem,  there 
exists  at  least  one  convergent  subsequence.  For  every  such  subsequence  K  c  N0,  there 
exists  a  p,^  E  such  that  pPi{xi )  — >K  Poo,  as  i  — >■  oo.  Moreover,  since  p^  E 
Yljen  l-loo  =  1- 

If  j  ^  £l(x*),  then  there  exist  a  t  >  0  and  i0  E  Id  such  that  f^ix^  —  if^ixf)  <  —t 
for  all  i  >  i0.  Hence,  from  (II. 9),  p3p. (xf)  — *  0,  as  i  — *  oo,  and  therefore  p3^  =  0.  By 
continuity  of  V/J'(-),  j  G  12, 

->*  -L  J>iV/V)ll2  =  <WV),  (11.57) 

jeo 

as  i  — *  oo.  Since  /Too  £  So  and  =  0  for  all  j  ^  12  (x*),  we  find  in  view  of  (11.11) 
that 

<WV)  =  -Y.lUM*')  -  /V))  -  |ll  <  «o(V).  (11.58) 

jefi  jeo 

This  completes  the  proof.  □ 

The  next  result  is  identical  to  Lemma  3.1  in  Polak  et  al.  (2008). 

Lemma  11.16.  Suppose  that  {xj}°20  C  Rd  is  a  sequence  constructed  by  Algorithm 
II.  2.  Then  there  exists  an  i*  E  No  and  a  set  12*  C  Q  such  that  the  working  sets  12* 
satisfy  12,  =  12*  for  all  i  >  i* . 

Proof.  By  construction,  C  12i+1  for  all  %  E  No-  Since  the  set  Q  is  finite,  the  lemma 
must  be  true.  □ 

The  following  result  ensures  convergence  of  Algorithm  II. 2. 
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Theorem  11.17.  Suppose  that  Assumption  II-4(i)  holds.  Then  any  accumulation 
point  x*  G  Rd  of  a  sequence  {x*}fL0  C  constructed  by  Algorithm  II. 2  satisfies  the 
first-order  optimality  condition  6{x*)  =  0. 

Proof.  Let  Q*  C  Q  and  i*  G  No  be  as  in  Lemma  11.16,  where  O*  =  Q*  for  all 
i  >  i*.  As  Algorithm  II. 2  has  the  form  of  Master  Algorithm  Model  3.3.12  in  Polak 
(1997)  for  all  i  >  i*,  we  conclude  based  on  Theorem  3.3.13  of  Polak  (1997)  that  any 
accumulation  point  x*  of  a  sequence  {x*}°T0  constructed  by  Algorithm  II. 2  satisfies 
0n*(x*)  =  0.  The  assumptions  required  to  invoke  Theorem  3.3.13  in  Polak  (1997): 

(i)  Continuity  of  if  a*  (•) ,  'f’pn*  (•) ,  On*  (•) ,  and  0pn*  (•) ,  p  >  0,  which  follows  by  Assump¬ 
tion  II.4(i),  Proposition  II.l(i),  Theorem  2.1.6  of  Polak  (1997),  and  Proposition 
Il.l(iii),  respectively. 

(ii)  The  pairs  ((FMXpn»),  0pn*{-))  in  the  sequence  {((FMXpa.),^.(-))W  are  col> 
sistent  approximations  to  ((FMX^*  ),  0q*(  •)),  which  follows  by  Theorem  11.15. 

(iii)  If  Steps  1  to  4  of  Algorithm  II. 2  are  applied  repeatedly  to  (FMXpn*)  with  a  fixed 

p  >  0,  then  every  accumulation  point  a:  of  a  sequence  constructed  must 

be  a  stationary  point  of  (FMXpn*),  i.e.,  0pn*(x)  =  0,  which  follows  by  Theorem 
3.2  in  Polak  et  al.  (2008). 

Since  0n*(x*)  =  0,  from  (11.11),  there  exists  a  /i  G  such  that 

2 

.(l*)-/V))  +  5  =  °-  (IL59> 

jen*  jen* 

Let  7T  G  Sq,  7 A  =  0  for  j  G  Q  —  fP,  and  7rJ  =  /P  for  j  G  O*.  Thus,  it  follows  from 
(11.11)  that 

2 

0(x*)  >  —  7^ (if (x*)  —  fi(x*))  —  \  i jiy fi(x*)  =  0.  (11.60) 

j£Q  i&Q 

Since  0(-)  is  a  nonpositive  function,  the  result  follows.  □ 

2.  Smoothing  Algorithm  Using  Cost  Descent 

Next,  we  consider  the  second  smoothing  algorithm,  which  determines  the  step- 
size  based  on  the  actual  function  if(-)  rather  than  the  smoothed  function  ifpn(-). 
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Algorithm  II. 3. 

Data:  x0  E  Md. 

Parameters  and  Auxiliary  Functions:  a,(3  E  (0, 1),  function  Bpq(-)  as  in  (11.47) 
or  (11.48),  e  >  0,  (p  >  l,p0  >  l,p  po,  k  S>  1,  £  >  1, 7  >  0,  v  E  (0, 1),  A p  >  1. 

Step  0.  Set  i  =  0,  f20  =  Qe(x 0),  /c_  1  =  0. 

Step  1.  Compute  BPi^i.{xi)  and  its  largest  eigenvalue  ct“q*( Xi ).  If 

>  k,  (11.61) 

compute  the  search  direction 


hPi£li(Xi)  —  VlppiQiiXi)- 

Else,  compute  the  search  direction  /iPint  {xj)  by  solving  the  equation 

Bp.n.{xi)hPicli{xi)  =  -Vii>Pisii(xi). 


(11.62) 


(11.63) 


Step  2a.  Compute  a  tentative  Armijo  stepsize  based  on  working  set  fh,  starting 
from  the  eventual  stepsize  of  the  previous  iterate  /q_i,  i.e.,  determine 


(x^  =  max  /3l 

s.t.  i/tp&ixi  +  PlhPiQi(xi ))  -  ikp&Axi)  <  atf3l('V/ipPini(xi),  hPi^{xi))-  (H.64) 


Set 


Vi  =  Xi  +  /3lhPiSii(xi). 


(11.65) 


Step  2b.  Forward  track  from  yi  along  direction  h^ci^Xi)  as  long  as  ip(-)  continues 
to  decrease  using  the  following  subroutine. 

Substep  0.  Set  l1  =  /, 


Zw  =  Xi  +  Pv hp^Xi)  and  zu>- 1  =  xt  +  f3v  lhPini(xi). 


(11.66) 


Substep  1.  If 


fp{zw- 1)  <  l/>(zu'), 


(11.67) 
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replace  V  by  V  —  1,  set  zu1- 1  =  and  repeat  Substep  1. 

Else,  set  ^  =  2^. 

Substep  2.  If  pj  <  p,  go  to  Step  3.  Else,  go  to  Step  4. 

Step  3.  If 

^{zi)  -  ^{xi)  < — (11.68) 

Vi 

set  Xi+\  =  Zi,pi+ 1  =  pi,  ki  =  set  fb+i  =  Id*  U  Qe(xi+ 1),  replace  i  by  i  +  1,  and  go  to 
Step  1. 

Else,  replace  p*  by  £p*,  replace  fb  by  fb  U  Qe(zi),  and  go  to  Step  1. 

Step  4.  If  (11.68)  holds,  set  ay+ 1  =  /c*  =  set  pj+i  =  pt  +  Ap,  set  fb+i  = 

fb  U  Qe(xi+ 1),  replace  i  by  i  +  1,  and  go  to  Step  1. 

Else,  set  xl+\  =  i/i,  ki  =  /,  set  pi+ 1  =  p*  +  Ap,  set  fb+i  =  fb  U  Qb^+i),  replace  i  by 
i  +  1,  and  go  to  Step  1.  □ 

As  is  standard  in  stabilized  Newton  methods  (see  for  example  Section  1.4.4 
of  Polak,  1997),  Algorithm  II. 3  switches  to  the  steepest  descent  direction  if  -Bpo(‘)  is 
given  by  (11.48)  and  the  largest  eigenvalue  of  E>po(-)  is  large;  see  Step  1.  Compared 
to  Algorithm  3.2  in  Polak  et  al.  (2008),  which  increases  p  when  ||  Vb,Pini(^i)ll  is  small, 
Algorithm  II. 3  increases  the  precision  parameter  only  when  it  does  not  produce  suffi¬ 
cient  descent  in  'b(-),  as  verified  by  the  test  (11.68)  in  Steps  3  and  4  of  Algorithm  II. 3. 
A  small  precision  parameter  may  produce  an  ascent  direction  in  'b(-)  due  to  the  poor 
accuracy  of  V;p4h(-)-  Thus,  insufficient  descent  is  a  signal  that  the  precision  param¬ 
eter  may  be  too  small.  All  existing  smoothing  algorithms  only  ensure  that  'ipPini(-) 
decreases  at  each  iteration,  but  do  not  ensure  descent  in  'b(-).  Another  change  com¬ 
pared  to  Polak  et  al.  (2003,  2008)  relates  to  the  line  search.  All  smoothing  algorithms 
are  susceptible  to  ill-conditioning  and  small  stepsizes.  To  counteract  this  difficulty, 
Algorithm  II. 3  moves  forward  along  the  search  direction  starting  from  the  Armijo 
step,  and  stops  when  the  next  step  is  not  a  descent  step  in  'b(-);  see  Step  2b. 

Algorithm  II. 3  has  two  rules  for  increasing  p^.  In  the  early  stages  of  the 
calculations,  i.e.,  when  pj  <  p,  if  sufficient  descent  in  'ip(-)  is  achieved  when  moving 
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from  Xi  to  Zi  ((11.68)  satisfied),  then  Algorithm  II. 3  sets  the  next  iterate  xl+l  to  zi7 
retain  the  current  value  of  the  precision  parameter  as  progress  is  made  towards  the 
optimal  solution  of  (FMX).  However,  if  (11.68)  fails,  then  there  is  insufficient  descent 
and  the  precision  parameter  or  the  working  set  needs  to  be  modified  to  generate  a 
better  search  direction  in  the  next  iteration.  In  late  stages  of  the  calculations,  i.e., 
Pi  >  p,  Algorithm  II. 3  accepts  every  new  point  generated,  even  those  with  insufficient 
descent,  and  increases  the  precision  parameter  with  a  constant  value. 

The  next  lemma  is  similar  to  Lemma  11.16. 

Lemma  11.18.  Suppose  that  {ay}°20  C  W1  is  a  sequence  constructed  by  Algorithm 
II.  3.  Then  there  exists  an  i*  G  N0  and  a  set  12*  C  Q  such  that  the  working  sets  12j 
satisfy  12 j  =  12*  and  'ipn*(xi)  =  ^( Xi )  for  all  i  >  i* . 

Proof.  The  first  part  of  the  proof  follows  exactly  from  the  proof  for  Lemma  11.16. 
Next,  since  Q(xf)  C  12,;:  for  all  i;  see  Steps  3  and  4  of  Algorithm  II. 3,  f)vi*{xi)  =  ^{xf) 
for  all  i  >  i*.  □ 

Lemma  11.19.  Suppose  that  Assumption  II-4(i)  holds,  and  that  the  sequences  C 

and  {pi}°30  C  M  are  generated  by  Algorithm  II. 3.  Then  the  following  properties 
hold:  (i)  the  sequence  {pj}°h0  is  monotonically  increasing;  (ii)  if  the  sequence  {ay}°h0 
has  an  accumulation  point,  then  pi  — >  oo  as  i  — >  oo,  and  1/ Pi  —  +oo. 

Proof.  We  follow  the  framework  of  the  proof  for  Lemma  3.1  of  Polak  et  ah  (2003). 
(i)  The  precision  parameter  is  adjusted  in  Steps  3  and  4  of  Algorithm  II. 3.  In  Step 
3,  if  (11.68)  is  satisfied,  then  pi+i  =  pp  if  (11.68)  fails,  p*  is  replaced  by  fpi  >  p^  In 
Step  4,  pi+ 1  =  pi  +  Ap  >  pi  +  1  >  p^ 

(ii)  Suppose  that  Algorithm  II. 3  generates  the  sequence  {a:i}jL0  with  accumu¬ 
lation  point  x*  G  but  {p.;}jL0  is  bounded  from  above.  The  existence  of  an  upper 
bound  on  pi  implies  that  Pi  <  p  for  all  i  G  No,  because  if  not,  Algorithm  II. 3  will 
enter  Step  4  the  first  time  at  some  iteration  i'  G  No,  and  re-enter  Step  4  for  all  i  >  i! , 
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and  Pi  — »  oo  as  i  — >  oo.  Thus,  the  existence  of  an  upper  bound  on  p*  implies  that 
Algorithm  II. 3  must  never  enter  Step  4. 

The  existence  of  an  upper  bound  on  p,  also  implies  that  there  exists  an  iteration 
i*  G  N0  such  that  (11.68)  is  satisfied  for  all  i  >  i*,  because  if  not,  p%  will  be  replaced  by 
f Pi  repeatedly,  and  Pi  — >  oo  as  i  — >  oo.  This  means  that  ip(xi+i)  —^}{xi)  <  —7/ Pi  for 
all  i  >  i*.  Since  pz  <  p  for  all  i  G  No,  ip(xi)  — >  —00  as  i  — >  00.  However,  by  continuity 
of  'ip(-),  and  x*  being  an  accumulation  point,  ifixf)— $K/if(x*),  where  K  c  N0  is  some 
infinite  subset.  This  is  a  contradiction,  so  pt  — >  00. 

Next,  we  prove  that  1/P*  =  +00.  Since  pt  ->  00,  there  exist  an  iteration 
i*  G  No  such  that  pi  >  p  for  all  i  >  i*.  This  means  that  the  precision  parameter  is 
adjusted  by  the  rule  pi+ 1  =  Pi  +  A p  for  all  i  >  i*.  The  proof  is  complete  by  the  fact 
that  1/i  —  00.  □ 

Lemma  11.20.  Suppose  that  Assumption  II-4(i)  holds.  Then  for  every  bounded  set 
S  C  and  parameters  a,  (3  &  (0, 1)  ,  there  exist  a  K  <  00  such  that,  for  all  p  >  1, 
C  Q,  and  x  G  S, 

ifPn(x  +  A pn{x)hpn(x))  -  ifpo,(x)  <  ;  (11.69) 

where  \pn(x)  Is  the  stepsize  defined  by  (II.  64)  and  hpn(x)  is  the  search  direction  as 
defined  by  (11.62)  or  (11.63),  with  p,  replaced  by  p,  replaced  by  Q,  and  xt  replaced 
by  x. 

Proof.  If  hpn(x)  is  given  by  (11.63)  with  Bvn(x)  as  in  (11.47),  then  the  result  follows 
by  the  same  arguments  as  in  the  proof  for  Lemma  3.2  of  Polak  et  al.  (2003).  If 
hpn(x)  is  given  by  (11.63)  with  Bp^{x)  as  in  (11.48),  then  the  result  follows  by  similar 
arguments  as  in  the  proof  for  Lemma  3.4  of  Polak  et  al.  (2003),  but  the  argument 
deviates  to  account  for  (i)  the  lower  bound  on  the  eigenvalues  of  Bpq(x)  takes  on  the 
specific  value  of  1  in  Algorithm  II. 3,  and  (ii)  we  consider  an  arbitrary  C  Q. 

Based  on  Assumption  II.4(i),  (II. 8),  and  the  assumption  that  S  is  a  bounded 
set,  there  exists  a  constant  M  <  00  such  that  ||V'0pn(a;)||  <  M,  for  all  p  >  1,  C  Q, 
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and  x  e  S.  Let 

SB  =  {x  G  Md|  ||a;  —  x'\\  <  M,x'  e  S)  ,  (11.70) 

and  L  <  oo  be  the  constant  corresponding  to  Sb  such  that  (11.15)  holds  for  all 
x  G  Sb,V  G  and  p  >  1.  For  any  real  d  x  d  matrix  A,  let  ||A||  denote  its  induced 
matrix  norm  as  defined  on  p.  20.  If  A  is  symmetric, 

\\A\\  =  amax,  (11.71) 

whenever  amax  >  0,  where  <rmax  is  the  largest  eigenvalue  of  A;  see  for  example  p.  3  of 
Lang  (2000).  Now,  suppose  that  p  >  1,  fl  C  Q,  and  x  G  S  are  such  that  hpn(x)  = 
—Bpa(x)~1'V'i^pci(x).  Since  all  induced  norms  are  consistent  by  definition,  ||/ipn(^)||  < 
ll^oW'llllV^nWH. 

By  construction,  Bpq(x)  is  symmetric  and  positive  definite  as  the  minimum 
eigenvalue  of  Bpq(x)  is  1,  because  tp  >  1,  and  based  on  (11.50).  Thus,  Bpq{x)~1 
is  symmetric  and  positive  definite;  see  for  example  Bertsekas,  Nedic,  and  Ozdaglar 
(2003,  p.  16).  Hence,  using  the  fact  that  the  eigenvalues  of  an  inverse  matrix  are  the 
reciprocals  of  the  eigenvalues  of  the  original  matrix,  and  (11.71), 

IIM*)II  <  IliWm'llllWvnMIl  =  (11.72) 

aPn  \x) 

and 

(v^M.VM><-ll7rL1"2.  <IL73> 

apa  \x) 

where  cr£Qn(x)  and  <7pQX((r)  are  the  smallest  and  largest  eigenvalues  of  Bp^{x)  respec¬ 
tively.  From  Step  1  of  Algorithm  II. 3,  we  see  that  the  direction  in  (11.63)  is  selected 
only  when  cr^^(x)  <  n,  and  by  construction  according  to  (11.50),  cr™(x)  >  1.  Hence, 
from  (11.72)  and  (11.73), 

IIViMII  <  HV^nWII  (11.74) 

and 

(VVW*),  M*)>  <  JV^(X)\\\  (11.75) 

hi 
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It  follows  directly  from  (11.62)  that  (11.74)  and  (11.75)  also  hold  when  hpn(x)  = 

-VV’pn(^)- 

Next,  for  all  A  G  (0,1], a:  G  S  and  p  >  1,  using  the  Mean- Value  Theorem 
(see  for  example  Section  5.1.28  of  Polak,  1997)  and  Lemma  II. 7,  we  have  for  some 
s  £  [0;  1]) 


ipPn(x  +  A hpn(x))  -  i>pn(x)  -  a\{Vippn(x),  hpn(x)) 

=  A(1  -  ot)(Vtppn(x),  hpa(x))  +  |A2  (hpn(x),  V2^pn  (x  +  sXhpQ(x))  hpQ(x)) 


<  A(1  -  a)(V7 J>psi(x),  hpn(x))  +  \X2pL\\hpn(x 


<  -A||V^n(x)||‘ 


1  —  a 


n 


\\pL 


Let 


A 


A*  =  min  <  1, 


2(1 -a) 
pLn 


(11.76) 


(11.77) 


Then  it  follows  from  (11.77)  that,  for  every  A  G  (0,  A*],  we  have 


tpPn(x  +  A hpn(x))  -  ippn(x)  -  aX{Vi/}pn(x),  hpn(x))  <  0.  (11.78) 

Hence,  by  (11.78)  and  the  stepsize  rule  in  (11.64), 


Xpn(x)  >  f3\* 


(11.79) 


for  all  p  >  1,  fl  C  Q}  and  x  G  S.  Consequently,  by  (11.64)  and  (11.79),  we  have  that 


ipPn(x  +  A pn(x)hpn(x))  -  4>pn(x) 
<  -ot\ Pn(x)(Vil>pn(x),hpn(x)) 


<  —a  min  <  /3, 


2/3(1  -  a)  \  W'V'ipp^x 


pLn 


K 


for  all  p  >  1,  O  C  Q,  and  x  G  S.  Hence,  the  conclusion  follows  with 

2/3(1  —  a) 


K  =  min  <  /3, 


Lk 


This  completes  the  proof. 


(11.80) 


(11.81) 

□ 
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Lemma  11.21.  Suppose  that  Assumption  II. 4 (i)  holds  and  that  {xi\°L0  C  Rd  is  a 
bounded  sequence  generated  by  Algorithm  II.  3.  Let  12*  C  Q  and  i*  G  No  be  as  in 
Lemma  11.18,  where  12*  =  12*  for  all  i  >  i* .  Then  there  exists  an  accumulation  point 
x*  G  Rd  of  the  sequence  {xj}“0  such  that  6q.*(x*)  =  0. 

Proof.  For  the  sake  of  contradiction,  we  assume  that  there  exist  a  p  >  0  such  that 

lim  inf  ||V'0Pia*(^)||  >  P-  (If. 82) 

i— »  oo 

Since  {:£i}°d0  is  a  bounded  sequence,  it  has  at  least  one  accumulation  point  according 

to  the  Bolzano- Weierstrass  Theorem.  Hence,  by  Lemma  11.19,  pi  — >  oo,  as  i  — >  oo. 

Consider  two  cases,  xi+\  =  yi  or  xi+i  =  z%  in  Algorithm  11.3. 

If  Xi+ 1  =  yi,  by  Lemma  11.20,  there  exists  an  M  <  oo  such  that 

\  aM||V'0pin*(^i)||2  /TTOO, 

ippitt*  (^i+i )  rPpitt*\Xi}  ,  (H.83) 

Pi 

for  i  >i*.  Hence, 


^Pi+1n*  (a:i+i)  -  ipPin*  fa)  =  ipPi+ in*  (xi+i)  -  ifPin*  (ari+i)  +  ipPin*  (xi+i)  -  ipPin*  (xf) 


< 


aM\\W^{xi)\p 

Pi 


(11.84) 


for  i  >  i*,  where  we  have  used  the  fact  from  Proposition  II.l(i)  that 


1pPi+1Cl*  (Xi+l)  'PPiCl*{Xi+ l), 


(11.85) 


for  i  >  i*,  because  pl+ 1  >  p,  from  Lemma  11.19. 

Next,  if  Xi+ 1  =  Zi,  then  (11.68)  is  satisfied.  It  follows  from  (II. 7)  and  Lemma 
11.18  that, 


0 Pi+1n *  (®i+i)  -  'PpiCi*  {xi)  <  -0a*  (®i+i)  +  lQg^  ^  (a*) 

Pi+ 1 

=  if(xi+i)  +  IOg^  -  -  4>(Xi) 

Pi+l 

<  _^f_  +  log  1^1 

_  Piv  Pi 

~7  +  PiV~l  log  1^*1 

Piu 


(11.86) 
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From  (11.84)  and  (11.86),  for  all  i  >  i*, 


fpPi+1n*(xi+ 1)  -  ^Pisi*{xi)  <  max 


aM\\V^Pin*(xi)\\2  -7  +  pi1'  Moglfi* 


Pi 


Pi 


(11.87) 

By  Proposition  Il.l(iii),  ||  V^Pio«(a;;)||  is  bounded  because  {xi}^20  is  bounded.  Since 
v  G  (0, 1),  there  exists  an  i**  G  N0,  where  i**  >  i*,  such  that 


aM\\V^Pi^(xi)\\'2  >  -7  +  Pi"  1  log|Q 


Pi 


Pi 


for  all  i  >  i**.  Therefore,  from  (11.87), 

^Pi+1n*(xi+ 1)  -i>Pin*(xi)  < 


aM\\SJ^Pin*(xi)f 

Pi 


(11.88) 


(11.89) 


for  all  i  >  i**.  Since  by  Lemma  11.19,  1  /Pi  —  +00,  it  follows  from  (11.84)  and 

(11.89)  that 

^Pin*(xi)  ->  -00,  as  i  ->■  00.  (11.90) 


Let  x*  be  an  accumulation  point  of  {xt }°h0.  That  is,  there  exists  an  infinite  subset 
K  C  No  such  that  Xi~>Kx*.  Based  on  (II. 7),  Lemma  11.19,  and  continuity  of 
it  follows  that  'f’Pin*(xi)^-K'ifa.*(x*),  as  i  — >  00,  which  contradicts  (11.90).  Hence, 
liminfj^oo  ||V^Pin*(xj)||  =  0.  Consequently,  there  exists  an  infinite  subset  K*  C  N0 
and  an  x*  G  R.d  such  that  Xi  — *  x*  and  9Pin* (xt)  — >K*  0,  as  i  — »  00,  which  implies 
that  limsup^oo  9Pin*(xi)  >  0.  From  Definition  1.3,  Theorem  11.15,  and  the  fact  that 
9q*(-)  is  a  nonpositive  function,  9q*(x*)  =0.  □ 


Theorem  11.22.  Suppose  that  Assumption  II. 4(1)  holds,  (i)  If  Algorithm  II. 3  con¬ 
structs  a  bounded  sequence  C  R.d,  then  there  exists  an  accumulation  point 

x*  G  R.d  of  the  sequence  {a;j}°h0  that  satisfies  9(x*)  =  0.  (n)  If  Algorithm  II. 3  con¬ 
structs  a  finite  sequence  {xi}\  10  C  R.d,  then  Step  2b  constructs  an  unbounded  infinite 
sequence  {z^y}] 7,™  with 


1p(Zi*i'_  1)  < 


(11.91) 


for  all  l1  G  {/,/  —  1,/  —  2, ...},  where  l  is  the  tentative  Armijo  stepsize  computed  in 
Step  2a. 
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Proof.  First,  we  consider  (i).  Let  the  set  12*  C  Q  be  as  in  Lemma  11.18,  where 
=  12*  for  all  i  >  i*.  Based  on  Lemma  11.21,  there  exists  an  accumulation  point  of 
the  sequence  x*  G  such  that  6n*(x*)  =  0.  The  conclusion  then  follows  by 

similar  arguments  as  in  Theorem  11.17. 

We  next  consider  (ii).  Algorithm  II. 3  constructs  a  finite  sequence  only  if  it  jams 
in  Step  2b.  Then  Substep  1  constructs  an  infinite  sequence  satisfying  (11.91) 

for  all  l'  &  {1,1  —  1,1  —  2, ...}.  The  infinite  sequence  is  unbounded  since  hp^^Xi)  ^  0 
as  (11.91)  cannot  hold  otherwise,  and  (3  G  (0, 1).  □ 

3.  Complexity 

Next,  we  consider  the  complexity  in  q  for  a  fixed  d  G  N  of  Algorithms  II. 2  and 
II. 3  to  achieve  a  near-optimal  solution  of  (FMX).  Suppose  that  all  functions  f-’(-)  are 
active,  i.e.,  12*  =  Q,  near  an  optimal  solution.  If  Bpq(-)  is  given  by  (11.47),  then  the 
main  computational  work  in  each  iteration  of  Algorithms  II. 2  and  II. 3  is  the  calcu¬ 
lation  of  Vt/’pC;  which  takes  O(q)  arithmetic  operations  under  Assumption  II. 9;  see 
the  proof  of  Theorem  11.10.  If  Hpq(-)  is  given  by  (11.48),  then  the  main  computational 
work  is  the  calculation  of  (11.48)  and  hpn(x).  Under  Assumption  II. 9,  it  takes  0(q ) 
arithmetic  operations  to  compute  fJ>3p(x),  for  all  j  G  Q,  O(q)  to  compute  V/J  (a;),  for 
all  j  G  Q,  0(q)  to  sum  Vfj(x)  V/J'(z)T,  O(q)  to  sum  V3P(X)' 

and  the  other  operations  take  0(1).  In  all,  the  number  of  arithmetic  operations  to 
obtain  Bp^(x)  is  O(q).  A  direct  method  for  solving  a  linear  system  of  equations  to 
compute  hpn(x)  depends  on  d,  but  is  constant  in  q.  Hence,  if  Bp^(-)  is  given  by  (11.48), 
the  computational  work  in  each  iteration  of  Algorithms  II. 2  and  II. 3  is  0(g).  It  is 
unclear  how  many  iterations  Algorithms  II. 2  and  II. 3  would  need  to  achieve  a  near- 
optimal  solution  as  a  function  of  q.  However,  since  they  may  utilize  Quasi-Newton 
search  directions  and  adaptive  precision  adjustment,  there  is  reason  to  believe  that 
the  number  of  iterations  will  be  no  larger  than  that  of  Algorithm  II. 1,  which  uses  the 
steepest  descent  direction  and  a  fixed  precision  parameter.  Thus,  suppose  that  for 
some  tolerance  t  >  0,  the  number  of  iterations  of  Algorithms  II. 2  and  II. 3  to  generate 
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{xj}”=0,  with  the  last  iterate  satisfying  i^{xn)  —  i\)*  <  t,  is  no  larger  than  O(logg), 
as  is  the  case  for  Algorithm  II. 1.  Then  the  complexity  of  Algorithms  II. 2  and  II. 3  to 
generate  xn  is  no  larger  than  O(qlogq),  which  is  the  same  as  for  Algorithm  II. 1. 

E.  NUMERICAL  RESULTS 

We  present  an  empirical  comparison  of  Algorithms  II. 2  and  II. 3  with  algo¬ 
rithms  from  the  literature  over  a  set  of  problem  instances  from  Polak  et  al.  (2003); 
Zhou  and  Tits  (1996)  as  well  as  randomly  generated  instances;  see  Appendix  A.  This 
study  appears  to  be  the  first  systematic  comparison  of  smoothing  and  SQP  algorithms 
for  large-scale  problems,  with  number  of  functions  q  up  to  two  orders  of  magnitude 
larger  than  previously  reported.  Specifically,  we  examine: 

(i)  PPP.  Pshenichnyi-Pironneau-Polak  min-max  algorithm  (Algorithm  2.4.1  in  Po¬ 
lak  1997). 

(ii)  e-PPP.  An  active-set  version  of  PPP  as  stated  in  Algorithm  2.4.34  in  Polak 
(1997);  see  also  Polak  (2008). 

(iii)  SQP-2QP.  Algorithm  2.1  of  Zhou  and  Tits  (1996),  an  SQP  algorithm  with  two 
QPs. 

(iv)  SQP-1QP.  Algorithm  A  in  Zhu  et  al.  (2009),  a  one-QP  SQP  algorithm. 

(v)  SMQN.  Algorithm  3.2  in  Polak  et  al.  (2008),  a  smoothing  Quasi-Newton  algo¬ 
rithm. 

(vi)  Algorithms  II. 2  and  II. 3  of  the  present  chapter. 

We  refer  to  Appendix  B  for  details  about  algorithm  parameters.  With  the  ex¬ 
ception  of  PPP  and  SQP-1QP,  the  above  algorithms  incorporate  active-set  strategies 
and,  hence,  appear  especially  promising  for  solving  problem  instances  with  large  q. 
We  implement  and  run  all  algorithms  in  MATLAB  version  7.7.0  (R2008b)  (see  Math- 
works  2009)  on  a  3.73  GHz  PC  using  Windows  XP  SP3,  with  3  GB  of  RAM.  All  QPs 
are  solved  using  TOMLAB  CPLEX  version  7.0  (R7.0.0)  (see  Tomlab  2009)  with  the 
Primal  Simplex  option,  which  preliminary  studies  indicate  result  in  the  smallest  QP 
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run  time.  We  also  examined  the  LSSOL  QP  solver  (see  Gill,  Hammarling,  Murray, 
Saunders,  &  Wright,  1986),  but  its  run  times  appear  inferior  to  that  of  CPLEX  for 
large-scale  QPs  arising  in  the  present  context. 

Algorithm  2.1  of  Zhou  and  Tits  (1996)  is  implemented  in  the  solver  CFSQP 
(Lawrence,  Zhou,  &  Tits,  1997)  and  we  have  verified  that  our  MATLAB  implementa¬ 
tion  of  that  algorithm  produces  comparable  results  in  terms  of  number  of  iterations 
and  run  time  as  CFSQP.  We  do  not  directly  compare  with  CFSQP  as  we  find  it  more 
valuable  to  compare  different  algorithms  using  the  same  implementation  environment 
(MATLAB)  and  the  same  QP  solver  (CPLEX). 

For  Algorithm  11.3,  unless  otherwise  stated,  we  use  the  Quasi-Newton  direction 
with  Bpn(x)  as  defined  in  (11.48),  because  preliminary  test  runs  show  that  generally, 
the  alternate  steepest  descent  direction  with  Bp^{x)  as  defined  in  (11.47)  produces 
longer  run  times.  We  examine  all  problem  instances  from  Polak  et  al.  (2003);  Zhou 
and  Tits  (1996)  except  two  that  cannot  be  easily  extended  to  large  q.  As  the  problem 
instances  with  many  variables  in  Polak  et  al.  (2003);  Zhou  and  Tits  (1996)  do  not 
allow  us  to  adjust  the  number  of  functions,  we  create  two  additional  sets  of  problem 
instances;  see  Appendix  A  for  details.  We  report  run  times  to  achieve  a  solution  x 
that  satisfies 

il)(x)  -  V>target  <  t,  (11.92) 

where  r/4arget  is  a  target  value  (see  Table  17  of  Appendix  A)  equal  to  the  optimal 
value  (if  known)  or  a  slightly  adjusted  value  from  the  optimal  values  reported  in 
Polak  et  al.  (2003);  Zhou  and  Tits  (1996)  for  smaller  q.  We  use  t  =  10~5.  Although 
this  termination  criteria  is  not  possible  for  real-world  problems,  we  find  that  it  is  the 
most  useful  criterion  in  this  study. 

Before  we  can  compare  the  run  times  of  the  various  algorithms,  we  need  to 
conduct  sensitivity  analysis  to  determine  a  robust  setting  (one  that  produces  the 
fastest  run  times  for  majority  of  the  problem  instances)  for  the  parameter  e  (see 
(11.46))  to  use  for  the  active-set  strategies. 
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1.  Selection  of  a  Robust  e  for  Active-Set  Algorithms 

Of  the  algorithms  compared,  e-PPP,  SQP-2QP,  SMQN,  Algorithms  II. 2  and 
II. 3  implement  some  form  of  active-set  strategies.  The  performance  of  these  active-set 
algorithms  depend  on  the  parameter  e,  which  defines  an  e-active  set  at  each  iteration. 
However,  as  e  is  not  used  exactly  the  same  way  in  the  different  algorithms,  we  do  not 
expect  the  robust  e  setting  to  be  the  similar  for  the  different  algorithms. 

For  the  sensitivity  analysis,  we  use  the  same  set  of  problem  instances  ProbC, 
ProbG,  and  ProbL  (see  Appendix  A)  for  all  active-set  algorithms.  The  three  problem 
instances  have  different  problem  dimensionality  d,  which  we  hope  contribute  a  robust 
setting  for  e.  We  include  the  non-convex  ProbG  (see  Table  17  of  Appendix  A)  to 
ensure  that  the  chosen  e  is  robust  for  both  convex  and  non-convex  problems. 

The  number  of  objective  functions  for  each  test  problem,  q  is  set  as  high  as 
possible  (in  powers  of  10),  without  encountering  memory  problems  for  any  of  the 
algorithms.  For  each  problem  instance  and  active-set  algorithm,  we  determine  the 
run  times  with  e  =  1000, 100, ...,  10“20.  We  present  a  representative  sample  of  the  run 
times,  leaving  out  (i)  those  run  times  that  do  not  change  much  when  we  decrease  e  by 
a  factor  of  10,  and  (ii)  those  run  times  that  are  significantly  longer  than  the  fastest 
run  time. 

a.  Selection  of  a  Robust  e  for  e-PPP 

Table  1  indicates  that  the  performance  of  the  algorithm  e-PPP  is  sen¬ 
sitive  to  e,  and  there  is  no  single  value  of  e  that  is  consistently  better  for  the  three 
problem  instances  considered.  The  word  “local”  indicates  that  the  algorithm  con¬ 
verges  to  a  locally  optimal  solution  for  the  non-convex  ProbG.  The  run  times  with 
e  =  10~2  to  e  =  10-4  seem  to  be  consistently  better  than  other  settings,  and  we  will 
use  e  =  10-3  for  the  algorithm  comparison  study. 

b.  Selection  of  a  Robust  e  for  SQP-2QP 

Table  2  indicates  that  the  performance  of  the  algorithm  SQP-2QP  is 
relatively  insensitive  to  different  e  values.  We  use  e  =  1  for  the  algorithm  comparison, 
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ProbC 

ProbG 

ProbL 

d 

2 

4 

4  x  102 

q 

105 

105 

102 

e  =  1000 

869.7 

local 

315.1 

e  =  100 

540.5 

local 

243.3 

e  =  10 

350.9 

local 

190.9 

e  =  1 

71.7 

local 

140.8 

e  =  RT1 

35.5 

local 

101.0 

e  =  10"2 

5.1 

local 

79.2 

e  =  10"3 

5.0 

local 

79.7 

e  =  10~4 

3.1 

local 

104.9 

e  =  IQ"5 

3.1 

local 

197.1 

fh 

1— 1 
o 

1 

o 

31.5 

local 

4246 

e  =  10"15 

>  7200 

local 

>  7200 

n\ 

O 

1 

to 

o 

>  7200 

local 

>  7200 

Table  1.  Run  times  based  on  e  for  e-PPP.  The  word  “local”  means  that  the  algorithm 
converges  to  a  locally  optimal  solution  that  does  not  satisfy  (11.92),  which  may  occur 
for  non-convex  problems. 


ProbC 

ProbG 

ProbL 

d 

2 

4 

4  x  102 

q 

105 

105 

102 

e  =  1000 

1.7 

2.7 

21.5 

e  =  100 

0.85 

2.4 

21.4 

e  =  10 

0.74 

2.5 

21.4 

e  =  1 

0.67 

2.4 

15.1 

(T\ 

h- 1 
O 

1 

0.71 

2.5 

15.0 

e  =  10"5 

0.76 

3.2 

14.3 

e  =  10"10 

0.72 

3.2 

14.2 

e  =  10~15 

0.76 

3.2 

14.4 

e  =  10"2U 

0.68 

3.1 

14.3 

Table  2.  Run  times  based  on  e  for  SQP-2QP. 
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as  it  provides  consistently  fast  run  times  as  seen  in  Table  2,  and  it  is  also  the  proposed 
value  in  Zhou  and  Tits  (1996). 

c.  Selection  of  a  Robust  e  for  SMQN  and  Algorithm  II.  2 

Algorithm  II. 2  is  very  similar  to  SMQN,  the  only  difference  being  the 
schemes  for  precision-parameter  adjustment.  Due  to  their  similarity,  we  conduct  the 
sensitivity  analysis  with  only  SMQN,  but  apply  the  resulting  e  to  both  algorithms  for 
the  algorithm  comparison. 


ProbC 

ProbG 

ProbL 

d 

2 

4 

4  x  102 

q 

105 

105 

102 

e  =  1000 

152.6 

105.2 

584.8 

e  =  100 

152.5 

105.5 

571.3 

e  =  10 

153.0 

103.6 

845.3 

e  =  1 

140.0 

116.5 

547.8 

rt\ 

O 

1 

112.0 

108.2 

153.2 

e  =  10"5 

83.9 

216.3 

113.9 

e  =  10"10 

11.8 

31.2 

113.9 

e  =  10"15 

12.2 

29.8 

114.1 

e  =  10"2U 

12.6 

25.3 

114.0 

Table  3.  Run  times  based  on  e  for  SMQN  and  Algorithm  II. 2. 


Table  3  provides  a  clear  indication  that  a  small  e  provides  the  fastest 
run  times  for  SMQN  consistently.  There  is  no  recommended  setting  for  the  parameter 
e  in  Polak  et  al.  (2008).  We  select  e  =  10~20  for  SMQN  and  Algorithm  II. 2  for  the 
algorithm  comparison. 

d.  Selection  of  a  Robust  e  for  Algorithm  II. 3 

Table  4  indicates  that  the  performance  of  Algorithm  II. 3  is  sensitive  to 
the  value  of  e  and  there  is  not  a  single  e  value  that  is  optimal  for  the  three  problem 
instances  selected.  Similar  to  SMQN  and  Algorithm  II. 2,  we  use  e  =  10-20  for  the 
algorithm  comparison. 
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ProbC 

ProbG 

ProbL 

d 

2 

4 

4  x  102 

q 

105 

105 

iM 

o 

T - 1 

e  =  1000 

5.4 

local 

0.34 

e  =  100 

5.4 

local 

0.35 

e  =  10 

3.7 

local 

0.34 

e  =  1 

4.3 

local 

0.77 

e  =  10'1 

3.0 

local 

3.4 

e  =  10"5 

3.5 

557.4 

4.3 

e  =  10"1U 

0.96 

27.6 

4.2 

e  =  10"15 

1.2 

22.3 

4.1 

e  =  10"2U 

1.3 

20.1 

4.6 

Table  4.  Run  times  based  on  e  for  Algorithm  II. 3.  The  word  “local”  means  that  the 
algorithm  converges  to  a  locally  optimal  solution  that  does  not  satisfy  (11.92),  which 
may  occur  for  non-convex  problems. 


In  view  of  the  above  sensitivity  analyses,  we  use  the  following  values  of 
e  to  compare  the  various  algorithms  in  the  next  section,  e  =  10-3  for  e-PPP,  e  =  1 
for  SQP-2QP,  and  e  =  1CT20  for  SMQN,  Algorithms  II. 2  and  II. 3. 

2.  Comparison 

In  this  subsection,  we  compare  the  algorithms  over  a  set  of  problem  instances 
from  Polak  et  al.  (2003);  Zhou  and  Tits  (1996)  as  well  as  randomly  generated  in¬ 
stances;  see  Appendix  A. 

a.  Minimizing  the  Maximum  of  up  to  1 00, 000  Functions 

Table  5  summarizes  the  run  times  (in  seconds)  of  the  various  algorithms, 
with  Columns  2  and  3  giving  the  number  of  variables  d  and  functions  q ,  respectively. 
Run  times  in  boldface  indicate  that  the  particular  algorithm  has  the  shortest  run 
time  for  the  specific  problem  instance.  The  numerical  results  in  Table  5  indicate  that 
in  most  problem  instances,  the  run  times  are  shortest  for  SQP-2QP  or  Algorithm 
II. 3.  Table  5  indicates  that  SQP-2QP  is  significantly  more  efficient  than  SQP-1QP  for 
problem  instances  ProbA-ProbG.  This  is  due  to  the  efficiency  of  the  active-set  strategy 
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in  SQP-2QP,  which  is  absent  in  SQP-1QP.  However,  for  ProbJ-ProbM,  SQP-1QP  is 
comparable  to  SQP-2QP.  This  is  because  at  the  optimal  solution  of  ProbJ-ProbM, 
all  the  functions  are  active.  This  causes  the  active-set  strategy  in  SQP-2QP  to  lose 
its  effectiveness  as  the  optimal  solution  is  approached. 

Table  5  indicates  also  that  Algorithm  II. 2  is  more  efficient  than  SMQN 
for  most  problem  instances.  As  the  only  difference  between  the  two  algorithms  lies 
in  their  precision-parameter  adjustment  scheme,  this  highlights  the  sensitivity  in  the 
performance  of  smoothing  algorithms  to  the  control  of  their  precision  parameters. 
Table  5  also  shows  that  Algorithm  II. 3  is  more  efficient  than  Algorithm  II. 2  and 
SMQN  for  most  problem  instances. 

Table  5  indicates  that  SQP-2QP  is  generally  more  efficient  than  Al¬ 
gorithm  II. 3  for  problem  instances  with  small  dimensionality,  d  <  4  (specifically 
ProbA-ProbG),  and  vice  versa.  This  is  consistent  with  the  common  observation  that 
SQP-type  algorithms  may  be  inefficient  for  problems  with  many  variables;  see  for 
example  Zhou  and  Tits  (1996). 

Table  5  shows  that  some  algorithms  return  locally  optimal  solutions  for 
some  problem  instances  (labeled  “local”  in  Table  5).  In  view  of  these  results,  there  is 
an  indication  that  smoothing  algorithms  (SMQN,  Algorithms  II. 2  and  II. 3)  tend  to 
find  global  minima  more  frequently  than  PPP  and  SQP  algorithms. 
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b.  Minimizing  the  Maximum  of  up  to  1 ,000,000  Func¬ 
tions 

Table  6  presents  similar  results  as  in  Table  5,  but  for  larger  q.  We  do 
not  present  results  for  PPP  and  SQP-1QP  as  the  required  QPs  exceed  the  memory 
limit.  The  comprehensive  sensitivity  studies  for  e  show  significant  improvement  for 
Algorithm  II. 3  for  ProbJ-ProbM  if  a  large  e  is  used.  Hence,  we  include  the  results  for 
Algorithm  II. 3  with  e  =  1000  in  Table  6.  This  e-value  means  that  there  is  effectively 
no  active-set  strategy.  Sensitivity  tests  conducted  for  the  other  algorithms  with  a 
larger  e  show  no  improvement  in  their  run  times. 

The  observations  from  Table  6  are  similar  to  those  for  Table  5.  Table 
6  indicates  that  Algorithm  II. 3  with  e  =  1000  is  efficient  for  ProbJ-ProbM,  which 
has  large  d  and  a  significant  number  of  functions  active  at  the  optimal  solution.  For 
completeness,  the  run  times  for  Algorithm  II. 3  with  e  =  1000  for  ProbJ-ProbM  in 
Table  5  are  2.8,  14.3,  0.36  and  3.0  seconds  respectively,  while  the  run  times  for  the 
other  problem  instances  are  longer  than  Algorithm  II. 3  with  e  =  10-20. 

The  results  in  Tables  5  and  6  indicate  that  among  the  algorithms  con¬ 
sidered,  SQP-2QP  and  Algorithm  II. 3  are  the  most  efficient  algorithms  for  minimax 
problems  with  a  large  number  of  functions.  The  run  times  for  ProbJ-ProbM  indi¬ 
cate  that  SQP-2QP  is  less  efficient  for  problem  instances  with  a  significant  number 
of  the  functions  that  is  nearly  active  at  the  solution,  as  the  active-set  strategy  loses 
its  effectiveness. 
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c. 


Randomly  Generated  Problem  Instances 


The  problem  instances  from  the  literature  examined  in  Tables  5  and 
6  include  either  cases  with  few  functions  e-active  at  an  optimal  solution  (ProbA- 
Probl)  or  cases  with  all  functions  e-active  (ProbJ-ProbM).  We  also  examine  randomly 
generated  problem  instances  with  an  intermediate  number  of  functions  e-active  at 
the  optimal  solution;  see  ProbN  in  Table  17  of  Appendix  A.  The  optimal  values  are 
unknown  in  this  case  but  the  target  values  given  in  Table  17  of  Appendix  A  appear 
to  be  close  to  the  global  minima. 


d 

q 

SQP-2QP 

(6=1) 

Algo  II. 3  SD 
(e  =  1000) 

Algo  II. 3  QN 
(e  =  1000) 

10 

10,000 

0.42 

0.64 

0.62 

100 

10,000 

0.82 

0.48 

0.54 

1,000 

10,000 

124.9 

0.38 

4.8 

10 

100,000 

4.1 

3.8 

4.2 

100 

100,000 

11.5 

3.8 

4.1 

1,000 

100,000 

mem 

4.3 

9.7 

1,000 

1,000,000 

mem 

37.2 

42.5 

1,000 

10,000,000 

mem 

421.8 

492.5 

10,000 

100,000 

mem 

6.3 

mem 

Table  7.  Run  times  (in  seconds)  of  algorithms  on  problem  instance  ProbN.  “SD”  and 
“QN”  indicate  that  Algorithm  II. 3  uses  Bpn(-)  given  by  (11.47)  and  (11.48),  respec¬ 
tively.  The  word  “mem”  indicates  that  the  algorithm  terminates  due  to  insufficient 
memory. 


Table  7  presents  the  run  times  for  Algorithm  II. 3  and  SQP-2QP  on 
ProbN.  As  the  problem  instances  are  relatively  well-conditioned,  Algorithm  II. 3  with 
Bpn(-)  given  by  (11.47),  i.e. ,  a  steepest  descent  (SD)  direction,  may  perform  well  and 
is  included  in  the  table.  The  parameter  e  for  Algorithm  II. 3  is  set  to  1000  for  this 
set  of  problem  instances,  as  preliminary  test  runs  show  that  it  is  consistently  better 
than  other  choices.  Table  7  indicates  that  SQP-2QP  is  less  efficient  than  Algorithm 
II. 3  for  problem  instances  with  large  d,  and  where  there  is  a  significant  number  of 
functions  e-active  at  the  optimal  solution.  The  last  row  in  Table  7  shows  that  for 
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problem  instances  with  d  >  10,000,  the  storage  of  the  dx  d  Hpq(-)  matrix  for  both 
SQP-2QP  and  Algorithm  II. 3,  with  Bpq(-)  given  by  (11.48),  causes  both  algorithms 
to  terminate  due  to  memory  limitations.  Thus,  Algorithm  II. 3,  with  Bpq(-)  given  by 
(11.47),  which  does  not  have  any  matrix  to  store,  may  be  a  reasonable  alternative 
when  d  is  large. 

F.  CONCLUSIONS  FOR  FINITE  MINIMAX 

This  chapter  focuses  on  minimizing  the  maximum  of  many  functions  and 
presents  complexity  and  rate-of-convergence  analysis  of  smoothing  algorithms  for  such 
problems.  We  find  that  smoothing  algorithms  might  only  have  sublinear  rates  of  con¬ 
vergence,  but  their  complexity  in  the  number  of  functions  is  competitive  with  other 
algorithms  due  to  small  computational  work  per  iteration.  We  present  two  smoothing 
algorithms  with  novel  precision-adjustment  schemes  and  carry  out  a  comprehensive 
numerical  comparison  with  other  algorithms  from  the  literature.  We  find  that  the 
proposed  algorithms  are  more  efficient  than  a  recent  smoothing  algorithm  from  the 
literature,  due  to  the  more  efficient  precision-adjustment  schemes  implemented.  The 
proposed  algorithms  are  competitive  with  SQP  algorithms,  and  especially  efficient  for 
problem  instances  with  many  variables,  or  where  a  significant  number  of  functions 
are  nearly  active  at  stationary  points.  The  numerical  results  indicate  that  smoothing 
with  first-order  gradient  methods  is  likely  the  only  viable  approach  to  solve  finite 
minimax  problems  with  many  functions  and  variables  due  to  memory  issues. 
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III. 


SEMI-INFINITE  MINIMAX  PROBLEM 


A.  INTRODUCTION 

In  this  chapter,  we  consider  semi-infinite  minimax  problems  of  the  form 

(SMX)  min  ip(x),  (III.l) 

where  ip  :  — >  M  is  defined  by 

tp(x)  =  max0(x,  y),  (III. 2) 

y£Y 

Y  is  a  compact  infinite  subset  of  Mm,  <p  :  x  — >  M,  d,  m  €  N,  and  (/>(-,•)  is 

continuous  and  sufficiently  smooth  on  WLd  x  Y  as  specified  below.  Note  that  if  Y  is 
a  finite  set  instead,  we  have  the  finite  minimax  problem  (FMX)  from  Chapter  II. 
The  notation  in  each  chapter  is  self-contained.  Hence,  we  here  as  well  as  below  reuse 
some  symbols  from  Chapter  II  in  definitions  of  new  quantities.  The  data  m,  i.e.,  the 
dimension  of  y,  is  as  we  see  below  a  key  quantity  and  we  refer  to  as  the  uncertainty 
dimension. 

In  general,  (SMX)  is  used  by  decision  makers  to  determine  the  optimal  re¬ 
sponse  to  the  worst-case  scenario.  (SMX)  arises  in  applications  such  as  finance 
(Rustem  &  Howe,  2002),  electrical  circuit  theory  (Demyanov  &  Malozemov,  1974), 
and  policy  optimization  (Becker,  Dwolatzky,  Karakitsos,  &  Rustem,  1986).  Solving 
(SMX)  is  difficult  for  two  reasons:  (i)  for  any  x  G  Md,^(a;)  may  not  be  computable 
in  finite  time  because  of  the  global  maximization  involved,  and  (ii)  ip(-)  may  not  be 
differentiable  everywhere. 

Several  methods  have  been  proposed  to  solve  (SMX);  see  Rustem  and  Howe 
(2002,  Chapter  2)  for  a  survey  of  semi-infinite  minimax  algorithms.  A  key  method 
for  solving  (SMX)  is  the  use  of  semi-infinite  programming  (SIP)  methods.  (SMX) 
can  be  reformulated  into  the  SIP 


(SMX  )  min  {z  \  c i>(x,y )  —  z  <0  Wy  EY}, 

(i,z)eRd+1 


(III.3) 


59 


involving  an  infinite  number  of  constraints,  which  can  then  be  solved  by  any  SIP  algo¬ 
rithm.  SIPs  are  usually  solved  by  solving  a  sequence  of  finite  problems,  i.e.,  problems 
with  a  finite  number  of  constraints.  Depending  on  how  the  finite  problems  are  cre¬ 
ated,  we  can  generally  group  SIP  algorithms  into  three  classes:  exchange  algorithms, 
local  reduction  algorithms,  and  discretization  algorithms;  see  Lopez  and  Still  (2007); 
Hettich  and  Kortanek  (1993);  Reemtsen  and  Gorner  (1998)  for  surveys  on  the  theory, 
applications  and  algorithms  of  SIP. 

In  exchange  algorithms  (Kortanek  &  No,  1993),  at  each  iterate  (x^Zi)  £ 
Rd+1,  i  £  N,  new  constraints  (j){x,  yi)  —  z  <  0  corresponding  to  a  maximizer  y*  £ 
arg  rnaXygy  4>(xi,  y )  are  added  to  the  finite  problem,  and  existing  constraints  removed, 
i.e.,  an  exchange  of  constraints  occurs.  In  local  reduction  algorithms  (Price  &  Coope, 
1990),  under  certain  regularity  assumptions,  the  SIP  can  be  converted  locally  into  a 
finite  problem. 

Discretization  algorithms  are  one  of  the  more  popular  classes  of  algorithms 
for  solving  SIPs  due  to  their  simplicity.  They  create  finite  problems  by  considering 
a  finite  discretized  subset  of  Y.  To  achieve  the  required  solution  tolerance,  most 
discretization  algorithms  implement  some  kind  of  adaptive  discretization  refinement 
rule  to  gradually  increase  the  level  of  discretization,  rather  than  fix  the  discretization 
at  a  high  level  right  from  the  start.  In  this  chapter,  we  refer  to  those  algorithms 
that  are  applied  to  solve  the  individual  discretized  problems  as  algorithm  maps,  to 
differentiate  them  from  the  overall  discretization  algorithm  that  usually  includes  some 
adaptive  discretization  refinement  rule.  At  each  stage  of  the  algorithm,  the  level 
of  discretization  is  fixed  and  an  algorithm  map  is  used  to  solve  the  finite  problem 
approximately.  The  approximate  solution  is  then  usually  used  to  warm-start  the 
next  stage.  We  refer  to  Hettich  (1986);  Reemtsen  (1991);  Polak  and  He  (1992);  Polak 
(1997)  for  examples  of  discretization  algorithms. 

There  are  also  algorithms  that  directly  address  (SMX)  without  the  reformu¬ 
lation  to  SIP.  The  algorithms  in  Chaney  (1982);  Klessig  and  Polak  (1973)  assume 
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that  the  maximum  in  (III. 2)  occurs  at  a  unique  point  y(x),  for  all  x  £  Md,  which  en¬ 
sures  that  is  differentiable.  Smooth  optimization  methods,  such  as  the  method 
of  centers  algorithm  in  Chaney  (1982)  and  a  first-order,  feasible  directions  method 
in  Klessig  and  Polak  (1973),  are  then  used  to  minimize  the  smooth  The 

conceptual  algorithm  in  Panin  (1981)  and  an  implementable  version  in  Kiwiel  (1987) 
use  a  convex  piecewise  linear  approximation  of  ■?/?(•)  to  solve  (SMX).  As  (SMX)  be¬ 
longs  to  the  general  class  of  nonsmooth  problems,  nonsmooth  optimization  algorithms 
such  as  subgradient  and  bundle  algorithms  (Rustem  &  Howe,  2002)  can  be  used  as 
well.  Subgradient  algorithms  determine  the  descent  direction  by  computing  at  least 
one  subgradient  at  each  iterate,  while  bundle  algorithms  use  subgradient  information 
over  several  successive  iterates  to  determine  the  descent  direction.  A  discretization 
algorithm  that  does  not  involve  the  reformulation  into  a  SIP  is  proposed  in  Demyanov 
and  Malozemov  (1971).  The  algorithm  solves  an  infinite  sequence  of  finite  minimax 
problems  of  the  form 

min  max  <fi(x,  y),  (HI-4) 

a;SRd  J/SVjv 

where  Y/v,  N  eN,  are  finite  discretized  subsets  of  Y.  This  approach  is  fundamentally 
the  same  as  converting  (SMX)  into  a  SIP  and  then  applying  discretization  methods. 

In  this  chapter,  we  propose  a  novel  way  of  expressing  rate  of  convergence,  in 
terms  of  computational  work  instead  of  the  typical  number  of  iterations.  We  first 
discuss  the  inadequacy  of  the  typical  rate  of  convergence.  We  consider  two  adaptive 
discretization  algorithms  (Polak  &  He,  1992;  Polak,  Mayne,  &  Higgins,  1992)  to  solve 
(SMX).  Polak  and  He  (1992)  propose  a  set  of  discretization  refinement  rules,  which 
ensures  that  their  adaptive  discretization  algorithm  generates  sequences  that  con¬ 
verge  to  a  solution  of  the  original  SIP  problem  at  the  same  linear  rate  with  the  same 
estimated  rate  constant  as  that  of  the  linearly  convergent  algorithm  map  used  in  the 
discretization  algorithm.  Another  similar  study  that  investigates  this  rate-preserving 
idea  is  found  in  Polak  et  al.  (1992)  for  a  semi-infinite  minimax  algorithm,  which  uses 
an  extension  to  Newton’s  method  as  the  algorithm  map.  Polak  et  al.  (1992)  state 
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that  the  rate  of  convergence  for  their  adaptive  discretization  algorithm  is  superlinear. 
Without  further  information,  a  user  probably  will  select  the  superlinearly  convergent 
algorithm  to  solve  (SMX).  However,  for  the  superlinearly  convergent  algorithm,  be¬ 
cause  the  level  of  discretization  needs  to  increase  rapidly  to  achieve  the  superlinear 
rate,  the  computational  work  between  iterates  increases  rapidly.  Thus,  the  compu¬ 
tational  time  may  not  be  well-correlated  to  the  superlinear  rate  of  convergence  since 
the  typical  rate  of  convergence  does  not  consider  computational  work. 

To  our  knowledge,  there  has  been  no  rate-of-convergence  result  that  considers 
computational  work  for  discretization  algorithms  for  SIP  and  (SMX).  That  said,  not 
all  rate-of-convergence  results  are  in  terms  of  the  number  of  iterations.  Still  (2001) 
studies  how  the  rate  of  convergence  for  SIP  discretization  algorithms  depends  on  the 
level  of  discretization  and  whether  the  discretization  includes  boundary  points  of  Y 
in  a  specific  way.  Shapiro  (2009)  determines  the  rate  of  convergence  of  an  e-optimal 
solution  of  the  discretized  problem  to  the  set  of  optimal  solutions  of  the  SIP  problem, 
as  a  function  of  the  level  of  discretization. 

In  our  proposed  way  of  expressing  rate  of  convergence,  we  relate  computational 
work  to  the  number  of  iterations  as  well  as  to  the  level  of  discretization  by  making 
some  computational  work  assumptions.  This  relation  allows  us  to  determine  the  rate 
of  decay  of  a  bound  on  the  error  between  the  iterates  generated  from  the  discretized 
problems  and  the  optimal  solution  of  (SMX)  as  a  function  of  computational  work, 
which  we  refer  to  as  rate  of  decay  of  error  bound  in  the  rest  of  the  chapter.  We 
use  this  new  way  to  develop  rate-of-convergence  results  for  various  fixed  and  adaptive 
discretization  algorithms  for  (SMX)  and  compare  them  against  the  rate  of  convergence 
of  an  e-subgradient  algorithm.  We  show  that  the  new  way  allows  a  fairer  comparison 
of  the  various  algorithms  than  the  typical  rate  of  convergence.  We  also  conduct 
numerical  studies  to  validate  the  theoretical  results  we  obtain. 

The  next  section  describes  the  discretization  approach  and  determines  the 
rate  of  decay  of  error  bound  for  discretization  algorithms  using  algorithm  maps  with 
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varying  rate  of  convergence.  Section  C  determines  the  rate  of  decay  of  error  bound 
for  an  e-subgradient  algorithm  and  compares  it  against  the  discretization  algorithms. 
Section  D  contains  numerical  results. 


B.  EFFICIENCY  OF  DISCRETIZATION  ALGORITHM 

We  start  this  section  by  describing  the  discretization  approach  for  (SMX)  and 
include  for  completeness  some  known  results  that  we  use  in  later  subsections. 

1.  Discretization 

The  discretization  approach  involves  approximating  Y  by  a  finite  subset  Y/v  C 
Y,  where  |YJv|  —  N  (|  •  |  denotes  the  cardinality  operator),  and  approximately  solving 
the  resulting  finite  minimax  problem 

(SMXA  min-^jv(^),  (III. 5) 

where  '■  — »  R,  N  G  N,  is  dehned  by 

iPn(x)  =  max0(x,  y).  (III. 6) 

y&N 

(SMX can  be  solved  using  any  finite  minimax  algorithms,  such  as  those  in  Chapter 
II.  In  the  remainder  of  this  chapter,  we  refer  to  elements  of  Yv  as  grid  points.  When 
they  exist,  we  denote  the  optimal  solutions  of  (SMX)  and  (SMX at)  by  x*  and  x*N, 
respectively,  and  the  corresponding  optimal  values  by  if*  and  if*N.  We  next  state  some 
properties  of  if(-)  and  7/i tv ( • ) • 

Proposition  III.  1 .  The  following  facts  hold: 


(i)  For  all  and  N  e  N,  ^n(x)  <  if{x). 

(ii)  Suppose  that  4>(- ,  ■)  is  continuous  onW1  xY .  Thenif(-)  andf>N{-)  are  continuous 
for  any  N  G  N  on  M.d. 

(in)  For  all  N  G  N, 

rN  <  r-  (ni.7) 
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Proof.  The  conclusion  (i)  follows  directly  from  the  definitions  of  -(/;(•)  and  i/W ( • ) ,  and 
the  fact  that  Yjy  C  Y.  Part  (ii)  follows,  for  example,  from  pp.  51  and  187  of  Demyanov 
and  Malozemov  (1974).  For  part  (iii),  by  definition  of  x*N,  ^n{x*n)  A  f’N^x)  for  all 
x  ER.d  (which  includes  x*),  thus,  based  on  part  (i), 

rN = Mx*n)  <  m**)  <  ^(x*)  =  r-  (in.8) 

□ 

In  this  section,  we  focus  on  the  following  basic  fixed  discretization  algorithm, 
for  which  we  develop  a  series  of  rate  of  decay  of  error  bound  results. 

Algorithm  III.  1.  Fixed  Discretization  Algorithm 
Data:  x0  E  Md. 

Parameters:  Discretization  parameter  N  £  N  and  parameters  required  for  the  al¬ 
gorithm  map. 

Step  1.  Generate  a  sequence  {ay}°80  by  applying  an  algorithm  map  to  (SMXat).  □ 
We  need  the  following  assumptions  for  the  rate  of  decay  of  error  bound  anal¬ 
ysis.  The  operator  ||  •  ||  denotes  the  Euclidean  norm. 

Assumption  III. 2.  The  functions  f(x,  -),x  E  Md,  are  uniformly  Lipschitz  continuous 
in  y,  i.e.,  there  exists  a  constant  L  <  oo  such  that 

\f>{x,y)  f>(x,y')\  <  L\\y-y'\\  (III.9) 

for  all  x  G  and  y,  y'  E  Mm.  □ 

We  require  an  assumption  on  the  discretization  scheme,  which  dictates  how 
Y/v  is  generated  from  Y  given  a  N  E  Id.  We  assume  that  the  same  discretization 
scheme  is  used  throughout  this  chapter  for  the  various  algorithms. 

Assumption  III. 3.  There  exists  a  Ni  E  N,  a  discretization  scheme  defined  for  all 
N  E  N,  N  >  Ni,  and  a  monotonically  decreasing  function  Am  :  Id  — >  M,  where  m  is 
the  dimensionality  ofY  and  A m(N)  — »  0  as  N  — >  oo,  such  that 

0  <  i>(x)  -  ifN(x)  <  A m(N)  (III. 10) 
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for  all  x  G  and  N  G  N,  TV  >  TVy  In  addition,  there  exists  a  L'  <  oo  such  that  the 
discretization  error  A m(N)  can  be  expressed  as 

(ln'n) 

for  all  N  E  N,  N  >  Ni,m  E  N.  □ 

Under  Assumption  III. 2,  Assumption  III. 3  holds  for  example  when  Y  is  the 
unit  cube  [0,  l]m,  and  YNl  N  >  2m,  is  the  uniform  grid  Iff  defined  in  each  of  m 
dimensions  by 

In  =  {°l  [NY™ J  -  1’  [NY™ J  -  1’ ^IIL12^ 

and  [-J  denotes  the  floor  function. 

There  are  \_N1^m\  grid  points  in  each  of  the  m  dimensions  of  Y ,  for  a  total  of 
\N1/m\m  grid  points.  Thus,  each  grid  element  is  a  cube  with  length  1  f°r  each 

edge  of  the  grid  element. 

To  continue  the  discussion,  we  need  a  way  to  quantify  the  “distance”  between 
two  sets.  We  use  Hausdorff  distance  for  this  purpose.  The  Hausdorff  distance  between 
Y  and  Uv  is  defined  as 

dist(Y,  Yn)  =  max  min  II y'  —  y\\.  (III. 13) 

y£Y  y'&YN 

The  Hausdorff  distance  between  Y  and  YN  is  the  maximum  distance  between  any 
point  y  G  Y  and  its  nearest  grid  point  in  YN. 

For  the  unit  cube  example,  the  Hausdorff  distance  between  Y  and  Y/v  is  then 
the  distance  from  the  center  to  a  corner  of  the  grid  element,  which,  based  on  the 
Euclidean  distance  of  two  points  in  m-dimensional  space,  is 

(IIU4) 

Let  y  G  Y(x)  =  argmax^y  cf(x,  y),  and  y\  G  YN  be  the  nearest  grid  point  to 
y.  Based  on  the  definition  of  the  Hausdorff  distance, 

l|yi  ~  ^  -  2{[NY^\  -  I)'  (IIL15) 
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Under  Assumption  III. 2,  there  exists  a  constant  L  <  oo  such  that 


(p(x,y)  -  0(x,2/i)  <  L\\yx 


(III. 16) 


A 


Let  Yn(x)  =  argmaxygyjv  <$>{x,y)  and  y 2  G  Y^{x).  Thus,  (f)(x,  y-f)  >  (f(x,yi)  and 


y)  -  0(x,  2/2)  <  </>(x,  y)  -  </>(u  yi). 


(III. 17) 


Since  if(x)  =  4>(x,y)  and  iPn(x)  =  4>(xjU2)  by  definition,  for  N  >  2m, 

0  <  Mx)  —  Ax)  <  L  — y/T  - -  <  L  - r.  (III. 18) 

-  r  2([A^1/mJ  —  1)  —  2(N1/m-2)  K  ’ 

There  exists  a  N\  e  N  such  that 


<  L 


m 


2(AU/m  -  2)  “  NY 


(III. 19) 


for  all  N  >  N{ .  This  completes  the  verification  that  Assumption  III. 3  holds  for  the 
unit  cube  with  a  uniform  grid. 

We  need  the  following  strong  convexity  assumption,  which  is  standard  for 
rate-of-convergence  analysis;  see  for  example  Polak  et  al.  (1992). 


Assumption  III. 4.  The  function  (p(-,y),  for  all  y  G  Y ,  is  twice  continuously  differ¬ 
entiable,  and  there  exists  an  a  G  (0,  00)  such  that 


a\\z\\2  <  (z,'Vlx<l>{x,y)z), 

for  all  x,  z  G  and  y  Gh. 


(III. 20) 

□ 


In  the  following  subsections,  we  derive  the  rate  of  decay  of  error  bounds  of  fixed 
(Algorithm  III.  1)  and  adaptive  (Algorithm  III. 2)  discretization  algorithms  to  solve 
(SMX)  in  terms  of  computational  work.  Hence,  we  need  to  define  precisely  what  we 
mean  by  an  error  bound  for  the  various  algorithms.  For  fixed  discretization  algorithms 
(Algorithm  III.  1) ,  we  denote  the  nth  iterate  of  a  fixed  discretization  algorithm  based  on 
discretization  parameter  N  by  xf .  Suppose  that  a  computational  budget  b  G  (0,  00) 
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is  allocated  to  solve  (SMX),  and  the  computational  work  required  to  run  rib  £  N 
iterations  of  a  fixed  discretization  algorithm  on  (SMXyvJ,  iV&  G  N,  is  no  larger  than  b. 
We  refer  to  the  quantity  if  (x„f)  —  if *  as  the  error.  An  upper  bound  on  this  quantity 
is  referred  to  as  an  error  bound.  Obviously,  there  are  many  possible  error  bounds. 
We  will  define  several  specific  error  bounds  for  analysis. 

For  the  rate  of  decay  of  error  bound  analyses  in  this  section,  we  consider  a  fixed 
discretization  algorithm,  Algorithm  III.  1,  with  an  ideal  algorithm  map  that  solves 
(SMXjvJ,  Nb  G  N,  exactly  in  one  iteration;  we  consider  an  adaptive  discretization 
algorithm,  Algorithm  III. 2,  and  we  consider  Algorithm  III.  1  with  algorithm  maps 
with  quadratic,  linear,  sublinear  rate  of  convergence,  as  well  as  a  specific  case  of  a 
smoothing  algorithm. 

We  need  the  following  assumption  on  computational  work  and  budget  for  the 
rate  analysis. 

Assumption  III. 5.  There  exist  a  G  (0,  oo)  and  v  G  [l,oo)  such  that  the  computa¬ 
tional  work  required  in  each  iteration  of  the  algorithm  map  in  solving  (SMXat)  is  no 
larger  than  aNv  for  all  N  gN.  □ 

The  preceding  assumption  holds  with  v  —  1  for  the  two  smoothing  algorithms 
proposed  in  Chapter  II,  and  holds  with  v  =  3  for  the  SQP  and  PPP  algorithms 
discussed  in  Chapter  II.  Suppose  that  the  assumption  holds  for  the  algorithm  map 
under  consideration  and  a  computational  budget  of  b  G  N  is  allocated  to  Algorithm 
III.  1 ,  to  run  n\,  iterations  of  the  algorithm  map  on  (SMXjvJ-  Then  Nb  and  rq,  must 
be  picked  such  that 

aNfrib  <  b.  (III.21) 

In  the  upcoming  analyses,  we  see  that  the  error  bounds  for  the  various  algo¬ 
rithms  often  have  two  components.  The  first  component  is  the  error  of  not  achieving 
the  optimal  solution  of  the  discretized  (SMXjvJ,  which  decreases  monotonically  as 
the  number  of  iterations  rib  increase.  The  second  component  of  the  error  bound  is  due 
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to  the  discretization  error  A m(jVj>),  and  it  decreases  monotonically  as  Nb  increases. 
From  this  point  onwards,  we  ignore  integrality  of  Nb  and  rib  to  simplify  analysis,  since 
it  will  not  affect  the  subsequent  rate  analysis  as  our  focus  is  on  asymptotic  rate  of 
decay  of  error  bounds,  when  Nb  and  rib  oo.  Since  Nb  and  rib  are  constrained  by  the 
inequality  in  (III. 21),  for  any  Ni  and  ri\  that  satisfy  (III. 21)  with  strict  inequality, 
there  must  exist  a  IV2  >  N\  and  a  n 2  >  ri\  that  satisfy  (III. 21)  with  equality,  i.e., 

oN^rib  =  b,  (111.22) 

which  produces  a  smaller  error  bound.  Thus,  we  use  (III. 22)  instead  of  (III. 21)  for 
subsequent  analysis. 

Let  {iVfejfegN  and  {rib} ben  be  sequences  that  satisfy  (III. 22)  for  all  6  G  N.  We 
define  {(Nb,nb)}bm  as  a  candidate  selection.  Suppose  that  a  particular  algorithm 
has  error  bound  e&,  b  G  N.  Obviously,  there  are  many  candidate  selections  that 
make  {eft}^  converge  to  zero.  However,  some  candidate  selections  result  in  faster 
rates  than  others,  and  we  want  to  find  these  selections.  We  note  that  the  topic  of 
determining  algorithm  parameter  values  to  optimize  algorithm  efficiency  has  been 
addressed  in  the  area  of  simulation  optimization  (Pasupathy,  2010;  Lee  &  Glynn, 
2003). 

We  first  consider  the  rate  of  decay  of  error  bound  e&  for  an  ideal  algorithm 
map,  which  solves  (SMXjvJ  exactly  in  one  iteration  for  any  jV&  G  N. 

2.  Ideal  Algorithm  Map 

Suppose  that  Assumptions  III. 3  and  III. 5  hold.  Suppose  also  that  a  compu¬ 
tational  budget  6  6  N  is  allocated  to  Algorithm  III.  1  with  an  ideal  algorithm  map  to 
solve  (SMXjvJ.  Since  (SMXjvJ  is  solved  exactly  in  one  iteration,  ipNb(x^b)  —  =  0- 

Based  on  Proposition  IILl(iii)  and  (III.  10), 

<  ipNb(xib)  +  A m(Nb)  -  i/j*Nb  =  A m(Nb)  =  L'-^ 
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(III. 23) 


where  L'  is  as  in  Assumption  III. 3.  We  define 


ideal  _ 

— 


a  L'  Jm 


N, 


1/m 


(III. 24) 


Theorem  III. 6.  Suppose  that  Assumptions  III. 3  and  III. 5  hold.  Suppose  also  that 
a  computational  budget  h  e  N  is  allocated  to  Algorithm  III.l  with  an  ideal  algorithm 
map  to  solve  (SMXjvJ.  Then  the  error  bound 


ideal 

eb 


(III. 25) 


for  all  b  e  N,  where  I!  is  as  in  Assumption  III.  3,  and  a  and  v  are  as  in  Assumption 

III.  5. 


Proof.  Since  an  ideal  algorithm  map  is  used,  =  1  for  all  b  e  N,  and  from  (III. 22), 
Nb  =  {b/a)1/1' .  The  conclusion  follows  by  substituting  jV&  into  (III. 24).  □ 

The  result  above  states  that  ej,deal  decays  at  an  asymptotic  sublinear  rate  of 
b~ i /mu  as  b  — >  oo.  Since  the  ideal  algorithm  map  solves  the  discretized  problems  ex¬ 
actly  (in  one  iteration),  the  rate-of-decay  result  for  e),deal  determines  the  rate  at  which 
the  error  between  the  function  values  at  the  solutions  of  the  discretized  problems  and 
the  function  value  at  the  solution  of  the  semi-infinite  problem  decays,  as  the  level 
of  discretization  increases.  Similarly,  the  rate-of-convergence  results  in  Still  (2001) 
and  Shapiro  (2009)  determine  the  rate  at  which  the  error  between  the  solutions  of 
the  discretized  problems  and  the  solution  of  the  semi-infinite  problem  decays,  as  the 
level  of  discretization  increases.  Thus  the  rate-of-decay  result  for  ej,deal  is  related  to 
the  rate-of-convergence  results  in  Still  (2001)  and  Shapiro  (2009). 

3.  Adaptive  Discretization  Algorithm 

The  preceding  result  for  fixed  discretization  can  be  generalized  for  a  potentially 
more  efficient  adaptive  discretization  algorithm  as  follows.  For  the  following  adaptive 
discretization  algorithm,  we  adopt  a  different  notation  (from  the  fixed  discretization 
algorithm)  for  the  iterates,  specifically,  we  denote  the  jth  iterate  at  the  ith  stage  by 

xi,j  ■ 
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Algorithm  III. 2.  Adaptive  Discretization  Algorithm 
Data:  xq  G  Md. 

Parameters:  Number  of  stages  s  G  N,  discretization  parameters  (7V,}*=1,  Nt  G  N, 
number  of  iterations  in  the  stages  {n,;}|=1 ,  nl  G  N,  and  parameters  required  for  the 
algorithm  map. 

Step  1.  Set  i  =  1. 

Step  2.  If  %  >  1,  warm-start  from  the  last  iterate  of  the  previous  stage  by  setting 
xi)X  =  Xi- .  Else,  set  xiti  =  x0. 

Step  3.  Generate  a  sequence  by  applying  a  finite  minimax  algorithm  map 

to  (SMXjvJ. 

Step  4.  If  i  <  s,  replace  i  by  i  +  1,  and  go  to  Step  2.  Else,  end.  □ 

Suppose  that  a  computational  budget  b  G  N  is  allocated  to  Algorithm  III. 2 
with  an  algorithm  map  with  an  arbitrary  rate  of  convergence  to  solve  (SMX).  Suppose 
also  that  Assumptions  III. 3  and  III. 5  hold. 

Based  on  Proposition  Ill.l(iii)  and  (III.  10), 


-  i>*  <  1pNs(xS,ns)  +  \n(Ns)  -  (III. 26) 

We  define 

^  adaptive  A  +  ^  (IH.27) 


Proposition  III. 7.  The  error  bound 

adaptive  >  L' ^UKT1/™ 
^ 1/mu 


(III. 28) 


for  all  6  6  N,  where  L'  is  as  in  Assumption  III.  3,  and  a  and  v  are  as  in  Assumption 
III.  5. 


Proof.  The  parameters  for  Algorithm  III. 2,  sGN,  {W}!=1,  -W  G  N,  and  {n*}|=1,  rq  G 
Id  satisfy 

a  (NfUl  +  Nfn2  +  ...  +  Nvsns)  =  b.  (III.29) 
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This  implies  that  crN ”  <  b,  and  thus, 


adaptive 

eb 


i>Ns(xs,n J  +  A m(Ns)  -  fp*Ns  >  A m(Ns)  > 


(III. 30) 


based  on  (III. 11)  and  the  assumption  that  Am(-)  is  a  monotonically  decreasing  func¬ 
tion.  □ 

The  result  above  indicates  that  the  eadaptnu  for  Algorithm  III. 2  with  any  al¬ 
gorithm  map  of  any  convergence  rate,  asymptotically  decays  with  a  rate  no  faster 
than  as  b  — »  oo.  In  the  following  subsections,  we  show  that  this  optimal  rate 

of  b~1/mu  can  be  achieved  using  the  fixed-discretization  Algorithm  III.  1  with  certain 
algorithm  maps. 

We  say  that  an  algorithm  map  converges  uniformly  when  applied  to  (SMX^v), 
IV  e  N,  if  the  respective  constants  c  and  n i  in  Section  I.D.2  do  not  depend  on  N. 


4.  Quadratically  Convergent  Algorithm  Map 

We  obtain  an  error  bound  for  Algorithm  III.  1  with  a  uniform  quadratically 
convergent  algorithm  map  in  the  next  lemma.  We  refer  to  Section  I.D  for  definitions 
of  the  various  rates  of  convergence  and  uniform  convergence. 


Lemma  III. 8.  Suppose  that  Assumptions  III. 3  and  III. 4  hold.  Suppose  also  that 
Algorithm  III.l  with  a  uniform  quadratically  convergent  algorithm  map  is  used  to 
solve  (SMX),  i.e.,  there  exist  n \  e  No,ni  <  oo,  and  c\  G  (0,oo)  such  that 

Mxn+i)  -  Vn 


<  Cl, 


[iM^n)  -V’ATJ 

for  all  n  >  n  i .  Then  there  exist  c,  n  <  oo  such  that  for  all  n  >  n  \  and  N  e  N 


(III. 31) 


i>(Xn)  +  A m(N). 


(III. 32) 


Proof.  Based  on  Proposition  IILl(iii),  (III. 10),  and  (III. 31), 

<  Mxn)  +  ^m(N)  -  rN 

<  on^K)  -  rN?n'ni  +  A m(N).  (III. 33) 
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From  (III. 10),  —if*N  <  —^{x*N)  +  A m(N).  Based  on  Assumption  III. 3,  A m(N)  is  a 
monotonically  decreasing  function,  thus  A m(N)  <  Am(N\ )  for  all  N  e  N,  N  >  N\. 
Since  if( x*N )  >  0(a;*)  by  definition, 


n  — 1  r 


*  12" 


Cl  ‘  [V'iV^nJ 


<  [Cl  (V’(^ni)  -  0*  +  Am(A!))_ 


2™  [Cl  ~  0*  +  Am(A!))]2 

Cl 


(III. 34) 


where  A)  is  as  in  Assumption  III. 3.  Above  we  use  the  fact  that  for  uniform  conver¬ 
gence,  ri\  is  independent  of  N . 

Under  Assumption  III. 4,  is  bounded  for  any  ri\  e  N  and  N  G  N,  N  >  N\. 

Since  ip(-)  is  continuous,  'iJAx^ )  is  bounded  for  any  ri\  e  N  and  N  e  N,  N  >  Ni.  Based 
on  Assumption  III. 4,  0*  is  finite.  As  e  Id,  A.m(A! )  <  oo  based  on  Assumption 
III. 3  and  (III. 11).  Finally,  ci  and  n \  are  independent  of  N  based  on  the  assumption 
of  uniform  convergence,  thus  c  and  k  <  oo.  □ 

From  (III. 32),  we  define  the  error  bound  for  Algorithm  III.  1  with  a  quadrati- 
cally  convergent  algorithm  map  as 


quad  A  2nb 
eb  ~  C 


hi  +  A  m.(Nb). 


(III. 35) 


The  next  result  states  that  if  we  choose  the  candidate  selections  in  a  certain 
way,  then  a  fixed  discretization  algorithm  with  a  quadratically  convergent  algorithm 
map  can  achieve  the  same  optimal  asymptotic  rate  of  decay  of  error  bound  as  Algo¬ 
rithm  III.  2. 

We  use  log(-)  to  denote  the  natural  logarithm. 

Theorem  III. 9.  Suppose  that  Assumptions  III.  3,  III. 4,  and  III.  5  hold.  Suppose  also 
that  a  computational  budget  b  e  N  is  allocated  to  Algorithm  III.l  with  a  uniform 
quadratically  convergent  algorithm  map  with  rate  of  convergence  given  by  (III. 31)  to 
solve  (SMX).  If 

b  log  2 

<r[loglog(6/cr)  —  log(— nw  log  c) 


\/v 


(III. 36) 
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and 


rib  = 


loglog(6/cr)  —  log(— mu  log  c) 


log  2 


(III. 37) 


for  all  b  e  N,  then  (III. 22)  is  satisfied  and 

log  e 


lim 


quad 

b 


(III. 38) 


b->  oo  log  b  mis 

where  rn  e  N  is  the  uncertainty  dimension  and  is  is  as  defined  in  Assumption  III.  5. 

Proof.  From  (III.  11),  (III. 22),  and  (III. 32), 


log  e 


quad  _ 


log  ^exp  [log  k  +  2b^aN^  log  c]  +  exp  log  L'y/rn  —  log(N^m) 

log  ^exp  [log  K  +  2flos  log  c)]/ iog  2  logcj 

b  log  2  \ 


exp  log  L'y/m - log  ,  — - - - - - - - - - - - ry 

mis  \vcr[loglog(o/cr)  —  log(— mis  log  c)J 

=  log  f  exp  [log  k  +  (log  c)  exp[log  log(6/ a)  —  log  (—mis  log  c)]] 


exp 


log  L'y/m - log 


b  log  2 


=  log  exp 


mis  \cr[loglog(6/cr)  —  log(— mis  log  c)] 
log  K  +  (log  c)  ^  ^ 


mis  log  c 

L'  ^fmo  [log  log(6/cr)  —  log(— mz/logc)] 


=  log  k  - 
.  a 


+ 


b  log  2  J 

L'  y/mcr[loglog(6) / a)  —  log(— mislogc)] 


1 

mis 


=  log 


b  log  2 

L'  ^/ma[log  \og(b/a)  —  log(— mislogc)] 


b  log  2 


K 


x 


(i) 


1 

mis 


L'  y/rncr  [log  log  ( b/ a ) — log  (—mv  log  c) 
6  log  2 


(III. 39) 
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where  we  use  the  fact  that  2x^log2  =  exp(m)  for  x  G  M.  Simplifying  the  second  term 
within  the  log(-)  function,  we  obtain  that 


K 


1 

mu 


L'  y/rncr  [log  log  (b/ a)  — log  (—ini'  log  c) 
6  log  2 


l 

mu 


K 


l 

mu 


L'y/rncr[ log  log(6/<r)  — log(— mv  log  c) 
1^2 


l 

mu 


(111.40) 


Since 

_ 1_ 

k(-)  mv 

lim  - — - —  =  0, 

f  L'^/mcr[log  log(6/cr)— log(— mu  log  c)]  \  mv 

V  lo§2  ) 

then  by  continuity  of  the  log(-)  function, 


lim  log 

6— >-oo 


\  (  L' y/rncr  [log  \og(b/ a)  — log(— mz/logc)] 

\  V 


\ 

x:  + 1 

nu 


0. 


(111.41) 


(111.42) 


Therefore,  continuing  from  (111.39), 


lim 

6— >  oo 


i  quad 

*Og  gfe 
log  6 


lim  - 

b— >oo  mb' 

log  6 
log  6 


log  L'y/rna 
log  b 

log  log  2  \  _ 
log  b  J 


+ 


log  [log  log (b/a)  —  log(— mvlogc) 
log  b 


1 

mv 


(111.43) 


This  completes  the  proof.  □ 

Roughly,  what  Theorem  III. 9  says  is,  if  you  make  certain  choices  for  the  dis¬ 
cretization,  by  picking  jVj,  and  rib  as  in  (III. 36)  and  (III. 37),  respectively,  for  large  b, 
if  b  increases  by  a  factor  b\  G  (1,  oo),  then  e^uadratlc  decreases  by  a  factor  b1  m" . 


5.  Linearly  Convergent  Algorithm  Map 

We  next  obtain  an  error  bound  for  Algorithm  III.  1  with  a  uniform  linearly 
convergent  algorithm  map. 


Lemma  III. 10.  Suppose  that  Assumptions  III.  3  and  III. 4.  hold.  Suppose  also  that 
Algorithm  III.l  with  a  uniform  linearly  convergent  algorithm  map  is  used  to  solve 
(SMX),  i.e.,  there  exist  n\  G  No,  n±  <  00,  N\  G  N,  and  c  G  (0, 1)  such  that 


Mx*)  -  Vn 


<  c, 


(III. 44) 
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for  all  n  >  ri\  and  N  >  N±.  Then  there  exists  a  k  <  oo  such  that  for  all  n  >  ri\  and 
N  >  N\ , 

1p(Xn)  ~^*<  cn«  +  A m{N).  (III-45) 

Proof.  Based  on  Proposition  IILl(iii),  (III. 10),  (III. 44),  and  using  similar  arguments 
as  the  proof  for  Lemma  III. 8, 

Hxn)  -  V  <  +  A m(N)  -  if*N 

<  cn~ni[^N(x^1)  —  ip*N]  +  Am(N) 

<  cn(c-ni[^«)  -r  +  A m(Ah)])  +  Am(IV),  (III. 46) 


where  Nl  is  as  in  Assumption  III. 3.  The  remaining  part  of  the  proof  follows  the  same 
arguments  as  the  proof  for  Lemma  III. 8.  □ 

From  (III. 45),  we  define  the  error  bound  for  Algorithm  III.  1  with  a  linearly 
convergent  algorithm  map  as 

efe-  =  cn»K  +  Am(Nb).  (III. 47) 


The  next  result  states  that  a  fixed  discretization  algorithm  with  a  linearly 
convergent  algorithm  map  can  achieve  the  same  asymptotic  rate  of  decay  of  error 
bound  as  Algorithm  III. 2. 


Theorem  III. 11.  Suppose  that  Assumptions  III. 3,  III. 4,  and  III.  5  hold.  Suppose 


also  that  a  computatio7ial  budget  b  G  N  is  allocated  to  Algorithm.  III.l  with  a  uniform 


linearly  convergent  algorithm  map  with  rate  of  convergence  given  by  (III.  44)  solve 


(SMX).  If 


Nh  = 


mbis  log  c 
a  log  b 


l/v 


and 

b  log  b 
rnbv  log  c 

for  all  b  G  N,  then  (III. 22)  is  satisfied  and 


lim 

6— >-oo 


log  e|,inear 
log  b 


1 

rrw  ’ 


(111. 48) 

(111. 49) 


(III. 50) 
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where  m  G  N  is  the  uncertainty  dimension  and  v  is  as  defined  in  Assumption  III.  5. 


Proof.  From  (III.  11),  (III. 22),  and  (III. 45), 

Mogc 


log  gj)near  =  log  (exp 


=  log  exp 


<rNt 


+  logK 


- ab  log  b  log  c 
crmbu  log  c 


+  exp 


+  log  K 


log  L'y/m  -  \og(N^m) 


+  exp 


log  L'y/rn  —  log 


-mbu  log  c 
cr  log  b 


1  /mi/'' 


=  log  exp 


log  L'y/m  —  log 


exP  +  loSK 


( 

—mbu  log  c 

l/mu\ 

l 

<rlog& 

} 

x 


exp 


log  L'y/rn  —  log 


—mbu  log  c 

l/mu\ 

cr  log  b 

). 

Simplifying  the  second  term  within  the  outermost  log(-)  function, 

exP  [=7i?+logK] 


K 


—mbu  log  c 
cr  log  b 


1/mu 


exp 


log  L'^/m  —  log 


( 

—mbu  log  c 

l/mu\ 

bx!mv  L'y/rri 

a  log  b 

). 

K 


—mu  log  c 
cr  log  b 


1  1/mu 


L’y/rri 


Since 


K 


lim 

5— 7-oo 


—mu  log  c 
<7  log  b 


1/mu 


L'y/m 

then  by  continuity  of  the  log(-)  function, 


=  0, 


lim  log 

6— >■  oo 


exP  [~^r  +  loS  k 


exp 


log  L'y/rri  —  log 


( 

—mbu  log  c 

l/mu\ 

l 

<j  log  b 

). 

w  +  1  >  =  0. 


(III. 51) 


(III. 52) 


(III. 53) 


(III. 54) 
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Therefore,  from  (III. 51), 


lim 

6— >-oo 


log  ej,inear 
log  b 


lim 

6— >-oo 

lim 

6— >  oo 


log  L' 


m 


log 


—mbv  log  c 
a  log  b 


l/mis 


log  b 


log  b 


1  ( log  (—mis  log  c)  +  log  b  —  log  o  —  log  log  b 


mis 


log  b 


mis 


(III. 55) 


This  completes  the  proof. 


□ 


6.  Sublinearly  Convergent  Algorithm  Map 

Since  both  the  quadratically  and  linearly  convergent  algorithm  maps  obtain 
the  ideal  rate  of  decay  of  error  bound  of  one  may  think  that  the  rate  of  the 

algorithm  map  does  not  matter.  But  that  is  not  the  case,  as  the  following  counter 
example  shows.  We  next  obtain  an  error  bound  for  Algorithm  III.  1  with  a  uniform 
sublinearly  convergent  algorithm  map.  We  define  the  initial  error  e$  =  ip(x o)  —  ip* . 


Lemma  III.  12.  Suppose  that  Assumption  III. 3  holds.  Suppose  also  that  Algorithm 
III.l  with  a  uniform  sublinearly  convergent  algorithm  map  is  used  to  solve  (SMX), 
i.e.,  there  exist  JV]  £  N  and  a  >  1  such  that 


iPn(x. 


N  ) 

n+1  J 


A 


N 


<  1  - 


ipN(x% )  -  ip*N  ~  n  +  a 
for  all  n  eN,  N  eN,  and  N  >  Ni.  Then  for  any  n  EN,  N  eN,N  >  Ni, 

a  —  1 


(III. 56) 


V’  (Xn)  -  Ip*  < 


n  —  1  —  a 


eo  +  2Am(N). 


(III. 57) 


77 


Proof:  Based  on  Proposition  III.l,  (III. 10),  and  (III. 56), 


(Xn)  -  ^ 

<  ll> N  (%n)  +  Am(N)  -  lj}*N 


<  |1-- 
a 


a 


1  "f  a 


n  —  1  +  a 


iM^o)  -  tp*N]  +  &m{N) 


< 


a 

a  —  1 


1  H-  a 


n  —  2  +  a 
n  —  1  +  a 


'^n(x o)  -  fp*N]  +  Am(iV) 


< 


n  —  1  +  a 
a  —  1 

n  —  1  +  a 


ty(x0)-r  +  &m{N)}  +  Am(N) 

eo  +  2Am(N). 


(III. 58) 


This  completes  the  proof.  □ 

From  (III. 57),  we  define  the  error  bound  for  Algorithm  III.l  with  a  sublinearly 
convergent  algorithm  map  as 


b  a  a  1  eo  +  2Am(iVb)_  (nL59) 

rib  ~  1  T  a 

The  next  result  states  that  a  fixed  discretization  algorithm  with  a  sublinearly 
convergent  algorithm  map  is  unable  to  achieve  the  same  asymptotic  rate  of  decay  of 
error  bound  as  Algorithm  III. 2. 


Theorem  III. 13.  Suppose  that  Assumptions  III. 3,  III. 4,  cmd  III.  5  hold.  Suppose 
also  that  a  computational  budget  b  G  N  is  allocated  to  Algorithm  III.l  with  a  uniform 
sublinearly  convergent  algorithm  map  with  rate  of  convergence  given  by  (III.  56)  to 
solve  (SMX).  Then  for  all  possible  sequences  of  {(Nb,nb)}ben, 

lncrpsub  1 

liminf  — - — b—  > - ,  (III. 60) 

&-*•  oo  log  b  rrw ' 

where  rn  G  N  is  the  uncertainty  dimension  and  v  is  as  defined  in  Assumption  III.  5. 
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Proof.  From  (III.  11),  (III. 22),  and  (III. 57), 


log  esbuh  =  log  exp  [log (a  -  1)  +  log  e0  -  log(n6  -  1  +  a) 


rj  r  /  /  loS&  .  log  &nb 

+  exp  log  2L  y/m - 1 - 

mv  mv 


>  log  (  max  <  exp  [log  (a  —  1)  +  log  eo  —  log  (rife  —  1  +  a)] , 


",  0  T/  / —  log  b  log  anb 

exp  log  2L  v  m - 1 - 

mis  mis 


=  max  <  log  (exp  [log (a  -  1)  +  log  e0  -  log(nfe  -  1  +  a)]) , 


1  (  fl  O  T'  / —  !°g&  ,  log  ^6 

log  exp  log  2L  y/m - 1 - 

\  [  mis  mis 

=  max  /  log(a  —  1)  +  loge0  —  log (n&  —  1  +  a), 


,  0 r/  / —  log ^  ,  log anb 

log  2 L  y/m - 1 - 

rrus  mis 


(III. 61) 


Hence,  for  any  b  >  2, 


>  max 


log(a  -  1)  log  e0  _  log(nfc  -  1  +  a) 

log  b  log  b  log  b 

log  2  L' y/m  log  b  log  anb  1 

log  b  mis  log  b  mis  log  b  J 


(III. 62) 


For  the  sake  of  contradiction,  we  assume  that  there  exists  a  sequence  {Nb,  nb}b>2 


where 


oo  log  b  mis 


(III. 63) 


This  implies  that  for  every  e  >  0,  there  exists  an  infinite  subsequence  B  C  N,  a 


b1  G  B,  b\  >  2  such  that 


log  e&ub  <  1  ,  c 

log  b  ~  mv 


(III. 64) 


for  all  b  e  B,b  >  b\.  From  (III. 62)  and  (III. 64), 


log(a  -  1)  log  e0  _  log (nb  -  1  +  q)  < _ L  +  e 

log  b  log  b  log  b  ~  mv 


(III. 65) 
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and 


log  2 L'y/m  1  log  crib  <  1 

log  b  mis  rnis  log  b  ~  mis 

for  all  b  e  B,b  >  b\.  From  (III. 65)  and  (III. 66),  there  exists  a  b-2  G  B.b2  >  &i  such 

that 


+  e,  (III. 66) 


and 


log Ub 
log  b 


1 

> - 2e, 

mis 


(III. 67) 


<  2mise,  (III. 68) 

log  b 

for  all  b  E  B,b  >  b2.  Since  e  is  arbitrary,  (III. 67)  contradicts  (III. 68)  for  sufficiently 
small  e,  and  the  conclusion  follows.  □ 


7.  Smoothing  Algorithm  Map 

In  this  subsection,  we  analyze  the  rate  of  decay  of  error  bound  for  Algorithm 
III.  1  using  smoothing  algorithms  as  algorithm  maps.  We  first  repeat  some  of  the 
known  results  on  the  exponential  smoothing  technique  from  Section  II. B,  based  on 
the  assumptions  and  notation  in  this  chapter. 

For  any  p  >  0  and  N  e  N,  we  consider  a  smooth  approximating  problem  to 
the  generally  non-differentiable  (SMX^v), 


(SMX^J  mm^Np(x),  (III.69) 

where 

'ihp(x)  =  -log  (  exp(p0(z,y))  )  (HI-70) 

P  \yevN  ) 

=  rj}N{x)  +  -  log  |  exP  (p(0(®>  v)  -  ^n{x)))  )  (III. 71) 
P  \y&yN  ) 

is  the  exponential  penalty  function. 

The  parameter  p  >  0  is  the  smoothing  precision  parameter,  where  a  larger  p 
implies  higher  precision.  With  the  obvious  notational  changes,  we  have  a  similar  result 
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on  the  bounds  for  ipNP(x)  —  ip  n(x)  and  the  differentiability  of  ipNp(')  as  Proposition 
II.l. 

Assumption  III. 14.  The  function  <p(-,  •)  is  twice  continuously  differentiable  on  Rd  x 

Y.  □ 

Lemma  III. 15.  Suppose  that  Assumption  III.  If  holds.  Then  for  every  bounded  set 
S  cR.d,  there  exists  an  L  <  oo  such  that 

{zy2iPNp(x)z)<pL\\z\\f  (III. 72) 

for  all  x  E  S,z  G  Rd,  N  e  N,  and  p  >  1. 

Proof.  The  proof  follows  similar  arguments  as  the  proof  for  Lemma  II. 7  on  p.  20.  □ 

When  they  exist,  we  denote  the  optimal  value  of  (SMXjvp)  by  ip*Np  for  any 
N  e  N  and  p  >  0,  and  the  optimal  solution  of  (SMXjvp)  by  x*Np.  We  denote  the  nth 
iterate  of  a  sequence  generated  by  an  algorithm  map  when  applied  to  (SMX^p)  by 

r,.Np 

• 

Lemma  III. 16.  Suppose  that  Assumption  III. 4  holds.  For  any  x,  z  E  Rd,N  e  N, 
and  p  >  0, 

a, 1 1 ^ 1 1 2  <  (z,  V2ipNP(x)z)  ,  (III. 73) 

where  a  satisfies  the  inequality  in  Assumption  111.4- 

Proof.  The  proof  follows  the  same  arguments  as  the  proof  for  Lemma  II.5  on  p.  19. 

□ 

The  Armijo  Gradient  Method  is  referenced  in  the  following  proposition.  The 
Armijo  Gradient  Method  uses  the  steepest  descent  search  direction  and  the  Armijo 
stepsize  rule  to  solve  an  unconstrained  problem;  see  for  example  Algorithm  1.3.3  of 
Polak  (1997). 

Proposition  III. 17.  Suppose  that  Assumption  III. 4  holds,  N  e  N,  and  p  >  1.  Then 
the  rate  of  convergence  for  the  Armijo  Gradient  Method  to  solve  (SMXatp)  is  linear 


81 


with  coefficient  1  —  k/p,  for  some  k  G  (0, 1) .  That  is,  for  any  sequence  {x„p}™=0  C 
generated  by  the  Armijo  Gradient  Method  when  applied  to  (SMXjvp)?  there  exists  a 
k  G  (0, 1)  such  that 

JVpO&J+i )  -  Vnp  <  [i>NP(x*p)  -  ip*Np)]  for  all  n  G  N0.  (III.74) 

Proof.  The  proof  follows  the  same  arguments  as  the  proof  for  Proposition  II. 8  on 

p.  22.  □ 

Lemma  III. 18.  Suppose  that  Assumptions  III.  3  and  I  I I.j  hold.  If  the  Armijo  Gra¬ 
dient  method  is  applied  on  (SMXjvp)7  where  p  >  1,  N  G  N,  N  >  N\,  and  N\  is  as 
defined  in  Assumption  III.  3.  Then  for  any  n  G  N, 

^p)~r<  fl--Ve0  +  2A m(N)  +  2^^,  (III. 75) 

V  pJ  p 

where  k  G  (0, 1)  is  the  constant  in  Proposition  III.  17. 

Proof.  Since  f>(-,y)  is  twice  continuously  differentiable  for  all  y  Gh,  with  an  equiv¬ 
alent  result  for  'V’jvp(-)  as  Proposition  II.  1,  'f’Np(-)  is  continuous  and 

0  <  ifNp(x)  -  ij>N(x)  <  (III. 76) 

p 

for  all  N  G  N,  p  >  0,  and  x  G  Based  on  Proposition  III. 17, 

^nP)~r 

<  lfN (x„P)  +  Am ( N)  -  lfN (: X * ) 

<  ifNp{XnP)  +  A m(N)  -  ipNp(x*)  +  (log N)/p 

<  f)Np(x*p)  -  ip*Np  +  A m(N)  +  (log  N)/p 

<  (1  -  {k/p))n  \f)Np(x 0)  -  ip*Np]  +  A m(N)  +  (log  N)/p 

<  (1  -  (k/p))r>  [ipN(x0)  +  (log  N)/p  -  ifN(x*Np)]  +  A m(N)  +  (log  N)/p 

<  (1  -  (k/p))n  [ip{x0)  +  (log  N)/p  -  i/>(x*Np)  +  Am(iV)]  +  A m(N)  +  (log  N)/p 

<  (1  -  ( k/p))n  [4;(xo)  -  ip(x*)]  +  2A m(N)  +  (2  log  N)/p.  (III.77) 
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This  completes  the  proof.  □ 

From  (III. 75),  we  define  the  error  bound  for  Algorithm  III.  1  with  a  smoothing 
algorithm  map  as 

gSmooth  A  L  _  k_\  6  eo  +  2Am(Nb)  +  3.,  (HL78) 

V  PbJ  Pb 

The  next  result  states  that  a  fixed  discretization  algorithm  with  a  smoothing 
algorithm  map  is  unable  to  achieve  the  same  asymptotic  rate  of  decay  of  error  bound 
as  Algorithm  III. 2. 


Theorem  III. 19.  Suppose  that  Assumptions  III. 3,  III. f,  and  III. 5  hold.  Suppose 
also  that  a  computational  budget  b  e  N  is  expended  by  running  rib  £  N  iterations 
of  the  Armijo  Gradient  method  on  (SMX^pJ,  with  discretization  parameter  Nb  e 
Id,  Nb  >  Ni  as  defined  in  Assumption  III. 3,  and  smoothing  parameter  pb  >  1.  Then 
for  all  possible  sequences  of  {Nb,nb,Pb}beN  satisfying  (III.  22), 


lim  inf 

6— >  oo 


Jog  ^smooth 

log  b 


>  - 


rrw 


(III. 79) 


where  rn  e  N  is  the  uncertainty  dimension  and  v  is  as  defined  in  Assumption  III.  5. 
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Proof.  From  (III.  11),  and  (III. 75), 

b 


loge1 


smooth 


=  log  exp 


log  1 - +  log  e0 

Pb 


+  exp 


log 


2  L\fm 


N, 


1/m 


+  exp 


log 


2  log  Nb 
Pb 


>  log  (  max  exp 
2  log  Nb 


<rNt 


f  k\  , 

1 - +  log  e0 

,  exp 

V  Pb)  \ 

log 


2  U  \[m 


N, 


1/m 


exp 


log 


Pb 

max  <(  log  ( exp 


crNt 


log  1 - +  log  e0 


Pb 


log  exp 


log 


2  L'\/m 


N, 


1/m 


,  log  exp 


log 


2  log  Nb 
Pb 


=  max 


am 


k 


Pb 


log  1 - +loge0,log 


2 L'yfrn  ,  21ogA^fe| 


N, 


1/m 


,log 


Pb 


.(III. 80) 


Hence,  for  any  b  G  N,  b  >  2, 


log  e 


smooth 


log  b 


>  max 


a Nj^  log  b 


k\  loge„  bgM 

pb  J  log  b  ,  log  b  log  b 


KIII.81) 


For  the  sake  of  contradiction,  we  assume  that  there  exists  a  sequence 
{iV6,  rift,  pb}ben  where 

logesmooth  x 

hill  lilt  - : - : -  < 


6->oo  log  b  mu 

This  implies  that  there  exists  an  infinite  subsequence  B  cN  such  that 


(III. 82) 


log  erooth  1 

hm  — — -  < - 

b^B oo  log  0 


mu 


(III. 83) 


which  further  implies  that  for  any  e  G  (0,  |  ) ,  there  exists  a  b\  e  B  such  that 


loge; 


smooth 


< 


+  le, 


log  b  mu 

for  all  b  E  B,b  >  bi.  From  (III. 81)  and  (III. 84),  we  have 

b  (  k\  1  , 

log  1 - < - +h, 


a  NY  log  b 


Pb 


mu 


(III. 84) 


(III. 85) 


84 


(III. 86) 


log  2 L'y/rn  _  logNb  < _ L  +  Ie 

log  b  rn  log  b  ~  mis  2  ’ 


log  2  log  log  ^  log  pb  1  ,  j 

- 1 - -  < - b  -€ 

log  b  log  b  log  b  ~  mis  2 


(III. 87) 


for  all  b  E  B.b  >  max{2,  bi}. 

There  exists  a  b2  E  B,b2  >  b\  such  that  (log2L'-v/m)/log6  >  —  \e  for  all  b  >  b2. 
From  (III.85)-(III.87),  we  get  that 

6  log  (l--)  <-—+€,  (III. 88) 

al\jj  log  b  \  pb  J  mis 


log  Nb  ^  1 

- i - T  — - e> 

m  log  b  mis 


(III. 89) 


log  log  Nb  _  log  pb  < _ L  +  e 

log  b  log  b  ~  mis 

for  all  b  E  B.b  >  max{2,  b-2}. 

From  (III. 89).  for  all  b  E  B,b  >  max{2,  b2 }, 


(III. 90) 


Nb  >  bv~ 


(III. 91) 


From  (III. 90),  for  all  b  E  B,b  >  max{2,  b2},  pb  >  1, 


< - b  e. 

mis 


This  implies  that 


log  Nb 


(III. 92) 


(III. 93) 


Next,  we  substitute  Nb  >  b»  me  from  (III. 91)  into  (III. 93),  and  we  get 

logNb  log b»~me  (1  \  i  _£  mTnU 

Pb  >  — — j— -  >  t  —  =  [--me]  bmu  log  b,  (III.94) 

0  0  mv'e  \1S  J 
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(III. 95) 


for  all  b  G  B,b>  max{2,  b2}.  Using  Nb  >  b *  me  from  (III. 91), 

b  b  _  b 

°Nb  log b  0.  ulogb  cr  (b1-™*)  log b ’ 

for  all  b  e  B,b>  max{2,  b2}.  Observe  from  (III. 94)  that  if  e  <  1 then  ^  —  e  >  0 
and  pb  — *  oo  as  6  — *  oo.  Thus,  there  exists  a  63  G  15,  63  >  b2  such  that  k/pb  G  [0, 1/2] 
for  all  6  >  63,  and  based  on  Lemma  11.13, 


,  .  k  \  2k 

log  1 - > - 

Pb  Pb 


(III. 96) 


for  all  b  G  B,b  >  b3. 

Based  on  (III.94)-(III.96), 

‘  log(l-U)  > 


<7 Ni  log  b 


Pb 


-2k 


or  (b1-mve)  log  b  (I  _  me)  b™~c  log  b 
-2k 


a  (/  —  me)  6  A 


—mue—e  ' 


(III. 97) 


As  e  <  — - mz/e  —  e  >  0  and 

-\-m.v '  mis 


log  b 

~2k  ■  0  as  b  ->•  00. 


-  —mue  —  e  i 


-1  1  «  1  #  v-  v.  ^  u  UJJ.J.V*  7“  7  I 

m!y  a(/-me)ftsL- 

Thus,  there  exists  a  64  G  £?  such  that  (III. 97)  contradicts  (III. 88)  for  all  b  >  64.  This 

completes  the  proof.  □ 


C.  EFFICIENCY  OF  e-SUBGRADIENT  METHOD 

Section  III.B  shows  that  discretization  algorithms  for  solving  (SMX)  can  ob¬ 
tain  at  best  an  asymptotic  rate  of  decay  of  error  bound  of  b~1/mi'  as  b  — »  00,  where 
m  is  the  uncertainty  dimension,  v  is  a  parameter  related  to  the  work  per  iteration  of 
the  algorithm  map,  and  b  is  the  computational  budget  expended.  Hence,  discretiza¬ 
tion  methods  may  perform  poorly  for  (SMX)  with  large  uncertainty  dimension.  In 
this  section,  we  show  that  an  e-subgradient  algorithm  for  (SMX),  which  relies  on  ad¬ 
ditional  assumptions  as  compared  to  discretization  algorithms,  have  more  favorable 
rate  of  decay  of  error  bound  for  moderate  and  large  m. 

This  section  starts  with  some  definitions  followed  by  a  description  of  the  e- 
subgradient  algorithm.  We  then  we  determine  the  rate  of  decay  of  an  error  bound 


based  on  the  e-subgradient  algorithm.  Most  of  the  background  information  on  the 
e-subgradient  algorithm  in  this  section  are  extracted  from  Bertsekas  (2010). 

We  start  by  defining  subgradients  and  sub  differentials. 

Definition  III.  1 .  Let  /  :  Wl  — y  R  be  a  convex  function.  A  vector  g  G  is 

(i)  a  subgradient  of  /(•)  at  a  point  x  G  if 

f(z)  >  fix)  -(z-  x)Tg  (III. 98) 

for  all  zeld, 

(ii)  an  e-subgradient  of  /(•)  (e  >  0)  at  a  point  x  G  if 

f{z)  >  fix)  -{z-  x)Tg  -  e  (HI-99) 

for  all  z  G 

(iii)  The  set  of  all  subgradients  of  a  convex  function  /(•)  at  x  G  M0'  is  called  the 

subdifferential  of  /(•)  at  x  G  which  is  denoted  by  d f(x).  □ 

We  consider  the  following  e-subgradient  algorithm. 

Algorithm  III. 3.  e-Subgradient  Algorithm 
Data:  x0  G  Md. 

Parameters:  a  >  0,  e  >  0. 

Step  1.  Set  i  =  0. 

Step  2.  Compute  y,  G  T  such  that 

<f>(xi,  Vi)  >  if{xi)  -  e.  (III. 100) 

Step  3.  Determine  the  next  iterate 

xi+ 1  =  Xi  -  aW X(j)(xi,  yf).  (III. 101) 

Step  4.  Replace  i  by  i  +  1,  and  go  to  Step  2.  □ 

The  key  step  of  Algorithm  III. 3  is  Step  2,  where  we  find  a  yl  G  Y  that  has  a 
value  within  e  of  ^{xf).  Under  the  assumption  that  for  all  y  G  Y,  0(-,  y)  is  convex  for 
all  x  G  the  search  direction  'Vx(j)(xi,yi)  is  an  e-subgradient  of  iff)  at  Xi. 
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In  Algorithm  III. 3,  we  use  constant  stepsize  a.  There  are  two  other  step- 
size  rules  for  subgradient  algorithms,  diminishing  stepsize  and  dynamically-chosen 
stepsize.  We  refer  to  Bertsekas  (2010,  pp.  272-274)  for  a  detailed  discussion  of  the 
advantages  and  disadvantages  of  the  three  schemes.  In  the  theoretical  and  numerical 
results  that  follow,  we  see  that  even  though  the  simplest  scheme  of  constant  step- 
size  is  considered,  the  rate  of  decay  of  error  bound  for  the  e-subgradient  algorithm 
is  fundamentally  better  than  the  discretization  approach  as  its  rate  of  decay  of  er¬ 
ror  bound  does  not  depend  on  the  uncertainty  dimension,  unlike  the  discretization 
case.  However,  as  stated  next,  we  need  an  additional  concavity  assumption  for  the 
e-subgradient  algorithm. 

Assumption  III. 20.  The  functions  4>(-,y )  are  convex  for  all  y  EY,  and  (p(x,  •)  are 
concave  for  all  x  G  Md.  □ 

The  above  assumption  is  necessary  as  subgradient  algorithms  only  handle  con¬ 
vex  problems.  In  addition,  the  concavity  assumption  is  required  to  ensure  that  the 
global  maximization  step  in  Step  2  of  Algorithm  III. 3  can  be  completed  in  finite  time. 
We  compare  that  to  the  assumptions  for  discretization  algorithms,  where  strong  con¬ 
vexity  on  4>(-,y)  for  all  y  e  Y  is  required,  but  no  concavity  assumption  is  necessary. 
We  refer  to  problems  that  satisfy  Assumption  III. 20  as  convex-concave  problems. 

We  also  need  the  following  assumption  on  the  boundedness  of  the  subgradients. 

Assumption  III. 21.  For  any  bounded  set  S  C  R.d,  there  exists  an  s  <  oo  such  that 

sup  (IMIb  e  dif(xi)}  <  s.  (III. 102) 

i£N0,Xi£S 

□ 

We  obtain  the  following  convergence  result  for  Algorithm  III. 3  from  Bertsekas 
(2010,  p.  349). 
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Proposition  III. 22.  Suppose  that  for  all  y  G  Y ,  4>(-,y)  is  convex  on  R.d.  If  {xi}ie^0 
is  a  sequence  generated  by  Algorithm  III. 3,  then  for  all  i  6  No  and  x  G 


x  -  a:i+i||2  <  ||a:  -  Xi\\2  -  2 a  [^(xf)  -  ip{x)  —  e]  +  a2||gj||2,  (III. 103) 


where  a  and  e  >  0  are  as  in  Algorithm  III. 3,  and  g%  G  is  an  e-subgradient  ofif(- ) 
at  Xi .  □ 

We  denote  the  optimal  solution  set  of  (SMX)  by  A"*  =  {a;  G  =  -0*} 

and  the  distance  of  the  initial  point  xo  to  A"*  by  dfx o)  =  minxe_Y*  ||:ro  —  x||.  We 
follow  the  arguments  in  the  convergence  analyses  of  Bertsekas  (2010,  Section  6.3) 
on  subgradient  algorithm  to  derive  the  following  two  convergence  results  for  the  e- 
subgradient  algorithm,  Algorithm  III. 3. 


Proposition  III. 23.  Suppose  that  Assumption  III. 21  holds,  and  that  4>(-,y)  are  con¬ 
vex  for  all  y  G  Y .  If  the  sequence  {:ri}ief5j0  is  generated  by  Algorithm  III. 3  in  solving 
(SMX),  then 


lim  inf  tfixj)  <  if*  H - Y  e, 

i—>  oo  2 


(III. 104) 


where  a  and  e  >  0  are  as  in  Algorithm  III. 3,  and  s  is  as  in  Assumption  III. 21. 


Proof.  The  proof  follows  the  same  arguments  as  that  for  the  subgradient  algorithm 
in  Bertsekas  (2010,  p.  275),  with  the  difference  that  (III. 103)  is  used  here  for  the 
e-subgradient  algorithm  instead  of  the  corresponding  equation  for  the  subgradient 
algorithm.  □ 

The  next  result  gives  an  estimate  of  the  number  of  iterations  required  by 
Algorithm  III. 3  to  attain  an  error  tolerance  of  {as2 / 2)  +  e  +  e'/2,  for  any  e'  >  0. 

Theorem  III. 24.  Suppose  that  the  functions  0(-,  y)  are  convex  for  ally  G  Y.  Suppose 
also  that  Assumption  III. 21  holds,  and  the  sequence  is  generated  by  Algorithm 

III.  3  in  solving  (SMX).  If  X*  is  nonempty,  then  for  any  e'  >  0, 

a  s2  A-  A-  A 

min  -  if*  <  - — ,  (III.105) 

0<i<K  2 


where 


K  = 


d(x0y 


ae' 


(III. 106) 


a  and  e  >  0  are  as  in  Algorithm  III. 3,  and  s  is  as  in  Assumption  III. 21. 


Proof.  We  follow  the  proof  for  the  subgradient  algorithm  in  Bertsekas  (2010,  Propo¬ 
sition  6.3.3),  with  (III. 103)  replacing  the  inequality  6.3.1(a)  of  Bertsekas  (2010).  For 
the  sake  of  contradiction,  we  assume  that  (III.  105)  does  not  hold.  Thus,  for  all  i  such 
that  0  <  i  <  K, 


ip(xi)  -  ip*  -e> 


as 2  +  e' 


(III. 107) 


Using  this  relation  in  (III.  103),  with  x  G  A"*,  we  obtain  for  all  i  such  that  0  <  i  <  K, 


min  |U.;_i_i  —  x*  II 2 

< 

min 

|  Xi  - 

-  X 

x*£X*  " 

x*ex* 

< 

min 

|  Xi  - 

-  X 

x*ex* 

< 

min 

|  Xi  - 

-  X 

x*ex* 

*  1 1 2 


* ii 2  o  +  e'  2  2 

—  2a - - - has 


,  2  „2 


Applying  (III.  108)  recursively,  we  obtain 


(III. 108) 


0  <  min  lUj+i  —  x*\\2  <  min  ||£0  —  x*\\2  —  (K  +  l)ae' . 

x*£X*  x*£X* 


Solving  for  K  gives 


K  < 


mm  \\xo  —  x 

x*ex* 

ae' 


*  1 1 2 


1  =  d(x0)2  _  1 


ae 


(III. 109) 


(III. 110) 


which  contradicts  (III.  106).  □ 

The  following  assumption  regarding  the  computational  work  required  for  func¬ 
tion  and  gradient  evaluations  provide  the  basis  for  analyzing  the  computational  work 
required  for  Algorithm  III. 3  to  solve  (SMX). 

Assumption  III. 25.  There  exist  constants  7,0' ,a"  <  00  such  that  for  any  x  G 
y  G  Y  C  the  computational  work  to  evaluate  any  of  the  three  functions  <p(x,y), 
Vx(p(x,y),  or  V ycp(x,y)  is  no  larger  than  yma' da" .  □ 
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The  following  assumption  ensures  that  Algorithm  III. 3  generates  bounded  se¬ 
quences. 

Assumption  III. 26.  The  level  set 

L{x0)  =  {x\if(x)  <  if(x0)}  (III. Ill) 

is  bounded,  for  all  xq  G  Mfi,  where  xq  is  as  in  Algorithm  III.  3.  □ 


We  refer  to  the  iterations  of  Algorithm  III. 3  as  major  iterations  and  the  it¬ 
erations  of  the  algorithm  map  in  Step  2  of  Algorithm  III. 3  as  minor  iterations.  We 
denote  the  jth  minor  iterate  during  the  ith  major  iteration  as  yl)3. 


Assumption  III. 27.  A  linearly  convergent  algorithm  map  is  used  in  Step  2  of  Algo¬ 
rithm  III. 3,  i.e.,  there  exist  a  c  G  (0, 1)  such  that 


xf)  (j){xi, 

^(xi)-  <j)(xi,yitj) 

for  all  j  >1.  In  addition,  the  computational  work  in  the  linearly  converging  algorithm 
is  no  larger  than  a  constant  number  of  function  and  gradient  evaluations  at  each 
iteration.  □ 


<  c 


(III. 112) 


We  define 

e“iax=  sup  |0(x,j/i)  -  <f>(x,y2)\.  (HI-113) 

x£Md,yi,y2£Y 

The  next  result  provides  an  upper  bound  on  the  computational  work  of  Algorithm 
III. 3. 


Theorem  III. 28.  Suppose  that  Assumptions  III. 2,  III. If,  III. 20,  III. 25,  III. 26, 
III. 27  hold,  and  X*  is  nonempty.  Then  for  any  xq  G  Rd,e,  e',a  >  0,  there  exist 
constants  a,  a',  a ",  c',  c"  <  oo,  such  that  the  computational  work  in  Algorithm  III. 3  to 
generate  {xi\f=Q,  to  solve  (SMX),  while  satisfying  (III. 105)  and  (III. 106)  is  no  larger 
than 


a 

ae' 


l°g(e/ej 


max  \ 
0  ) 


logc 


if  e  <  e 


max 
0  5 


(III. 114) 
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and 


amc  dc 


ae 


otherwise. 


(III. 115) 


Proof.  The  main  computational  work  in  Algorithm  III. 3  is  in  Step  2,  to  determine 
an  e-maximizer  yi.  Since  Y  is  bounded,  and  for  all  x  e  Md,  4>{x,  •)  is  globally  Lipschitz 
continuous  according  to  Assumption  III. 2,  e™ax  <  oo.  Based  on  Assumption  III. 27, 
the  number  of  iterations  required  to  obtain  yt  such  that  ^(ay)  —  <f)(xi,yi )  <  e'  is  no 
larger  than  (log(e/e™ax)/  logc)  +  1  if  e  <  e™3*,  and  equals  zero  if  e  >  e™ax.  Since  the 
computational  work  in  the  linearly  convergent  algorithm  is  no  larger  than  a  constant 
number  of  function  and  gradient  evaluations  at  each  iteration,  based  on  Assumption 
III. 25,  there  exist  constants  7,  a',  a "  <  00  such  that  the  computational  work  at  each 
iteration  is  no  larger  than  7 ma'da" . 

The  main  computational  work  in  Step  3  of  Algorithm  III. 3  is  the  computation 
of  the  gradient  Vx(f>(xi,yi),  and  based  on  Assumption  III. 25,  there  exist  constants 
C,c',c"  <  00  such  that  the  computational  work  for  Step  3  is  no  larger  than  £ mc'dc " . 

Based  on  Assumption  III. 26  and  (III. 106),  there  exists  a  f3  <  00  such  that 

d{x0 )2 


K  = 


ae 


<  d(x o)2  <  J3_ 


(III. 116) 


Thus  the  overall  computational  work  for  Algorithm  III. 3  to  generate  {ay}fl0, 
satisfying  (III.  105)  and  (III.  106)  is  no  larger  than 


ae 


where  a  =  max{/?C,  /Jy},  and 


logc 


c>  7  c" 

am  d 

- otherwise, 

ae 


if  e  <  egiax, 


(III. 117) 


where  a  =  f3(. 

From  (III.  105),  we  define  the  error  bound  for  Algorithm  III. 3  as 


(III. 118) 

□ 


subgrad  A  ® b S  T  2  Cfe  T 


(III. 119) 
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with  ab,eb,e'b  >  0  satisfying  the  following  inequalities  for  &eN, 


\mc'dc"  +  ma'da"  (l°g (eb/egiaX)  +l)]  <bifeb<  e”ax, 


(III. 120) 


arnc  dc 

- - —  <  b  otherwise. 

OLbdb 


(III. 121) 


We  call  a  sequence  {eb,  e'b,  ab}'^=1,  b  G  N  a  feasible  sequence  if  it  satisfies 
(III. 120)  and  (III.121). 

The  next  result  states  that  Algorithm  III. 3  achieves  an  asymptotic  rate  of 
decay  of  error  bound  e®ubgrad  Qf  6”1/2  as  b  — >  oo. 


Theorem  III. 29.  Suppose  that  Assumptions  III. 2,  III.  14,  III. 20,  III. 21,  III. 25, 
III.  26,  and  III. 27  hold.  If  Algorithm.  III.  3  is  used  to  solve  (SMX)  ,  then  for  all  arbitrary 
feasible  sequences  of  {eb,  e'b,ab}bm, 


i  suoe 

lim  inf  — -A-r 
oo  log  0 


subgrad 


>  — . 
“  2 


For  all  b  G  N,  if  eb  =  e'b  =  m/Vb  and  if 


(III. 122) 


ab  =  ^y  mc'dc"  +  ma' da"  ( +  A  whenever  efe  <  e“ax,  (III.123) 


C  AlC 

am  d 

ab  =  — — —  otherwise, 

K 


(III. 124) 


b  subgrad 

hm  — : — - 

b^foo  log  0 


(III. 125) 


Proof.  For  any  arbitrary  feasible  sequence  of  {eb,  e'b,  odfteN,  we  first  consider  the  set 


A  C  N,  where  eb  >  egiax  for  all  b  G  A.  Based  on  (III.  119),  e®ubgrad  >  e“ax 
b  G  A.  Since  e!fax  >  0  by  definition,  there  exists  a  b0  G  Id  such  that 


>  elfax  for  all 


logeP 

log  b 


subgrad  -i 

i - >  -- 

>g  b  2 


(III. 126) 


for  all  b  G  A,b  >  b0. 
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Next,  we  consider  those  b  e  N  —  A  (where  ’  represents  the  set  difference 
operator),  where  eb  <  e™3*.  Based  on  (III. 120),  we  obtain  that 


at  >  «  Ly  +  m°>"  f 


We  substitute  into  (III.  119),  and  we  obtain  that 


(III. 127) 


subgrad 


26e(  [  V  logc 

amc' dc”  s2  ama'da”s2  f  log(efe/e“ax) 


>  — — -  mccf  +  m“ 
2  oel 


+  1J  S  +6fe  +  — 


,  7  0  7  +1  +efe+  (III. 128) 
log  c  y  2 


Taking  log  on  both  sides,  we  obtain  that 


subgrad 


log  ^  exp  log  amc’dc"  s2  —  log  2 be'b  +  exp  log  ama'  da"  s2  —  log  2 be'b 
+  log  (l0g(^fX)  +  l)  ]  +  exp  [log  *]  +  exp  [log  §]  ) 


>  log  ^  max  <[  exp  log  amc  dc  s 2  —  log  2 be'b  , 


exp  log  ama‘ da" s2  —  log  2 be'b  +  log 


l°g(efe/ eo 


+ 1  i 


exp  [log  eb] ,  exp  log 


max  |  log  ^exp  log  amc  dc  s 2  —  log  2 beb  j  , 


log  (  exp  log  ama  da  s 2  —  log  2 beb  +  log 


log  (eb/el 


log  (exp  [log  eb\ ) ,  log  I  exp  log  — 


=  max  <  log amc  dc  s2  —  log2beb, 


log  ama  da  s 2  —  log  2beb  +  log 


log  eb,  log  j>. 


l°g(efe/ eo 


+ 1  ) 


(III. 129) 
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Hence,  for  any  b  e  N  —  A,  b  >  1,  we  obtain  that 

logeflbgiad  f  log  amc'dc”s2  loj 

Iog&  “  maX\  I 


>  max 


log  amc  dc  s 2  log  2beb 

log  b  log  b 

,  „  .  1--  figg (WgT!)  , 

log  ama  da  s 2  log25ef,  °&  ^  iogc  + 

log  6  log  6  log  6 

logeb  logy]  , 

log  6  5  log  b  ( 


(III. 130) 


For  the  sake  of  contradiction,  we  assume  that  there  exists  a  feasible  sequence 
{eb,  e'b,  ab}beN_A  such  that 


1  SU.D6 

lim  inf  — 

b— >oo  log  b 


subgrad 

b 


(III. 131) 


This  implies  that  there  exists  a  8  >  0,  a  b\  >  1,  and  an  infinite  subsequence  B  C  N  —  A 
such  that 


logesfcubgrad  _1 

log  b  <  2 

for  all  b  >  bi,b  G  B. 

From  (III.  130)  and  (III.  132),  we  obtain  that 


(III. 132) 


log  ama'da"s2  log  2 be'b  loS  j,  iogc  + 1)  1  11111341 

log  b  log  b  log  b  2  1 

(111. 135) 

log  b  2  v  ' 

and  ; 

(111. 136) 

log  b  2  v  ' 

for  all  b  >  bi,  b  E  B.  Based  on  (III.  134)  and  (III.  136),  there  exists  <5  >  0,  &2  >  &i,  and 
an  inhnite  subsequence  B  such  that 


_1  _  log 4  _i  _ 

log  b  2  2 


(III. 137) 
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and 


log  4  _1 

log  b  2 

for  all  b  >  62,  b  e  B. 


From  (III.  137),  we  obtain  that 


¥ 


log  4 

log  b 


(III. 138) 


(III. 139) 


which  contradicts  (III.  138). 

Thus,  the  conclusion  follows  for  the  first  part  of  the  theorem. 

Next,  we  prove  the  second  part  of  the  theorem.  Since  e&  =  m/Vb,  there  exists 
a  61  G  N  such  that  e&  <  e™3*.  Since  we  are  concerned  with  the  asymptotic  rate  of 
decay  of  error  bound  eaubgrad  when  b  — *  00,  we  only  need  to  consider  the  case  where 
e;,  <  eQiax.  Thus,  substituting  65  =  e'b  =  m/Vb  and  as  in  (III. 123)  into  (III. 119), 
we  obtain  that 


subgrad 

eb 


2m Vb 


mc'dc "  +  m“>" 


log 


ef^Vb 


+  1 


Vb 


am 


log  c 

''  dc”  s2  ama> da"  s2  A°S  (e“axvd> 


m  m 


+  -7=  + 


2m 


2m 


logc 


1  3m 

+  n+- 


Vb  2  Vb 

.  (III. 140) 


Taking  logs  on  both  sides  of  (III.  140),  we  obtain  that 


log  e] 


subgrad  _ 


log  +  log 


amc  dc  s 2  am 

+ 


a'da"s2  ( log  (  — 


K\/b 


2m 


2m 


log  c 


.  3  m 

+  1|+^T 


(III. 141) 
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We  consider  the  second  term  in  (III.  141),  and  obtain  that 


lim  log 

b— ¥oo 


am 


'  dc"  s2  ama'da”  s2  J  ^  \  3 m 

2m  2  m  log  c  /  2 


lim  log 

6— >•  oo 


ama'  da"  s2  f^°£  (i 


/ 


«Vb 


2m 


logc 


+  1 


ama  da  s 2  |  3m 

2m  '  2 


flmQ  da  s 2 
2m 


log 


logc 


+  1 


+  1 


/j 

(III. 142) 


Based  on  the  continuity  of  the  log  function,  and  the  fact  that 


lim 

b— ¥oo 


ama'  da"  s2  _i_  3m 
2m  ~T~  2 


log 


=  0, 


2m 


logc 


+  1 


(III. 143) 


the  right-hand  side  of  (III.  142)  simplifies  to 

ama'da"s2  A°S  (emaxyj 


lim  log 

6— >•  oo 


2m  \  log  c 
From  (III.  141)  and  (III.  144),  we  obtain  that 


+  1 


log77S 


QlaXV/& 


/  log 

i  subgrad  lncr  d_  ^  1  logc 

lim  l0g^  ,  =  lim  ^  +  lim  V 


+  1 


b-s>oo  log  b  b — ^oo  log  b  b-s>o o 


log  b 


ama> da" s* 


(III. 144) 


(III. 145) 


l°g(  - 2TST 

as  lirn^oo  — v  logfe - -  —  o.  Applying  L’Hopital’s  rule  on  the  second  term  of  the 

right-hand  side,  we  obtain  that 


lim 

6— >-oo 


( log 

log  I  —  logc 


g lax^S 


+  1 


\ 


log  b 


=  lim 

6— >-oo 


m  log  c  I  l  2emax63/2 
0max^  /  \  0 


1/6 


=  lim 

6— >-oo 


log 


„niax^5 


logc 


2  logc 


=  0,  (III. 146) 
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and  the  conclusion  follows. 


□ 


We  see  from  Theorem  III. 29  that  a  certain  choice  of  the  parameters  eb,e'b,  and 
ab,  b  e  N,  results  in  an  asymptotic  rate  of  decay  of  error  bound  e®ubgiad  Gf  fr^1/2.  if  we 
compare  against  the  fastest  rate  of  decay  for  a  discretization  algorithm  of  we 

see  that  for  moderate  and  large  m,  discretization  algorithms  may  not  be  competitive 
against  Algorithm  III. 3.  This  difference  in  the  rate  of  decay  of  error  bound  is  observed 
in  the  numerical  results  in  the  next  section. 

D.  NUMERICAL  RESULTS 

In  this  section,  we  provide  numerical  evidence  to  validate  some  of  the  key 
theoretical  results  obtained  in  Sections  III.B  and  III. C.  Proposition  III. 7  indicates 
that  the  asymptotic  rate  of  decay  of  error  bounds  for  discretization  algorithms  is  no 
faster  than  \rxlmv  as  b  — >  oo.  We  compare  that  with  the  e-subgradient  algorithm, 
Algorithm  III. 3,  where  Theorem  III. 29  indicates  that  a  rate  of  6-1/2  as  b  — >  oo  is 
attainable.  The  dependence  on  m,  the  uncertainty  dimension  for  the  discretization 
algorithms,  implies  that  under  certain  convexity-concavity  assumptions,  discretiza¬ 
tion  algorithms  will  likely  not  be  competitive  against  e-subgradient  algorithms  for 
semi-infinite  problems  with  high  uncertainty  dimension.  In  this  section,  we  provide 
some  indication  on  the  range  of  values  of  m  where  discretization  algorithms  are  not 
competitive  with  e-subgradient  algorithms  for  convex-concave  problems. 

From  Theorems  III. 9  and  III. 11,  the  asymptotic  rate  of  decay  of  error  bound 
for  a  discretization  algorithm  that  uses  a  quadratically  convergent  algorithm  map  is 
the  same  as  that  of  a  linearly  convergent  algorithm  map.  In  this  section,  we  examine 
if  there  is  any  numerical  difference  between  a  superlinearly  convergent  algorithm  map 
and  a  linearly  convergent  algorithm  map. 

We  compare  the  following  algorithms  over  a  set  of  problem  instances  from 
Rustem  and  Howe  (2002): 


(i)  Algorithm  III.  1  with  e-PPP  (an  active-set  version  of  PPP  as  stated  in  Algorithm 
2.4.34  in  Polak  1997;  see  also  Polak  2008)  as  the  algorithm  map. 

(ii)  Algorithm  III.  1  with  SQP-2QP  (Algorithm  2.1  of  Zhou  &  Tits  1996,  an  SQP 
algorithm  with  two  QPs)  as  the  algorithm  map. 

(iii)  The  e-subgradient  algorithm,  Algorithm  III. 3. 

The  first  two  algorithms  are  discretization  algorithms,  while  the  third  is  an  e-sub- 
gradient  algorithm.  We  refer  to  Appendix  D  for  details  on  the  algorithms  and  the 
algorithm  parameters  used. 

We  use  Problems  1  and  5  from  Rustem  and  Howe  (2002,  pp.  100-102),  which 
are  two-  and  three-dimensional  in  y,  respectively.  We  also  modify  Problem  1  to 
create  a  one-dimensional  (in  y )  problem  instance.  We  call  the  three  problem  instances 
SProbA,  SProbB,  and  SProbC,  in  increasing  order  of  ^-dimensionality;  see  Appendix 
C  for  details. 

Similar  to  Chapter  II,  we  implement  and  run  all  algorithms  in  MATLAB 
version  7.7.0  (R2008b)  (see  Mathworks,  2009)  on  a  3.73  GHz  PC  using  Windows  XP 
SP3,  with  3  GB  of  RAM.  All  QPs  are  solved  using  TOMLAB  CPLEX  version  7.0 
(R7.0.0)  (see  Tomlab,  2009)  with  the  Primal  Simplex  option.  In  Step  2  of  Algorithm 
III. 3,  we  use  TOMLAB  SNOPT  version  7.2-5  (see  Gill,  Murray,  &  Saunders,  2007) 
to  find  the  ^/-maximizer. 

In  Chapter  II,  we  consider  problem  instances  with  uncertainty  dimension  of 
one,  and  we  use  discretization  parameters  N  in  the  order  of  106  to  achieve  reasonable 
solution  tolerance.  In  all  the  finite  minimax  algorithms  considered  (including  SQP- 
2QP  and  e-PPP),  one  of  the  steps  is  to  compute  the  function  values  at  all  the  grid 
points  at  the  current  iterate.  In  Chapter  II,  we  implement  the  function  evaluation 
step  using  vector  operations  on  all  the  grid  points  in  a  single  line  of  code,  instead 
of  “looping”  through  each  grid  point,  to  ensure  better  efficiency.  In  this  chapter,  we 
consider  problems  with  uncertainty  dimensions  higher  than  one,  and  the  discretization 
parameters  required  to  achieve  reasonable  solution  tolerance  increase  to  orders  of  10s 
and  above.  This  requires  too  much  memory  if  the  same  implementation  as  that  in 
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Chapter  II  is  used.  Thus,  we  use  “looping,”  evaluating  subsets  of  the  grid  points  in 
each  loop.  The  different  implementation  is  applied  for  SProbB  (m  =  2)  and  SProbC 
(m  =  3),  since  the  original  more  efficient  implementation  still  works  for  SProbA 
(m  =  1).  Note  that  this  issue  does  not  affect  e-subgradient  algorithms  as  they  do  not 
deal  with  grid  points. 

We  report  run  times  to  achieve  a  solution  x  that  satisfies 

||X  -^target ||  <t,  (III.147) 

where  the  error  tolerance  t  =  10~4, 10~2, 10~3, 10“4, 10”5,  and  a;target  is  a  target  solu¬ 
tion  (see  Table  18  in  Appendix  C)  obtained  by  Algorithm  III. 3.  We  refer  to  Appendix 
C  for  details  on  the  procedure  to  obtain  o;target.  Algorithm  III. 3  is  chosen  for  these  ver¬ 
ification  analysis  as  preliminary  experiments  show  that  it  is  significantly  more  efficient 
than  the  other  two  algorithms,  especially  for  problems  with  uncertainty  dimension 
m  >  2.  Although  the  termination  criterion  (III.  147)  is  not  possible  for  real-world 
problems,  as  o;target  is  usually  unknown  beforehand,  we  find  that  it  is  the  most  useful 
criterion  in  this  study. 

1.  Problem  Instance  of  Uncertainty  Dimension  One 

Table  8  summarizes  the  run  times  (in  seconds)  of  Algorithm  III.  1  with  e-PPP, 
for  various  discretization  parameter  Nb  across  the  top  row,  to  achieve  various  error 
tolerances  t  listed  in  the  first  column.  Run  times  in  boldface  indicate  the  particular 
discretization  parameter  Nb  that  produces  the  shortest  run  time  for  the  specific  error 
tolerance  t.  An  asterisk  *  in  the  table  indicates  that  the  particular  discretization 
parameter  is  insufficient  to  achieve  the  desired  error  tolerance.  For  example,  in  Table 
8  with  Nb  =  1,000,  we  observe  that  the  iterates  do  not  change  after  a  certain  time 
(within  six  hours),  and  the  required  error  tolerance  of  t  —  10-4  has  not  been  met. 
A  double  asterisks  **  indicate  that  the  algorithm  failed  to  satisfy  the  required  error 
tolerance  after  six  hours.  Preliminary  experiments  show  that  Algorithm  III. 3  produces 
run  times  no  slower  than  ten  seconds  for  all  problem  instances  considered.  Thus, 
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we  choose  an  arbitrary  maximum  run  time  of  six  hours  (significantly  longer  than 
ten  seconds).  As  mentioned,  the  MATLAB  implementation  for  e-PPP  on  SProbA 
computes  the  function  values  in  a  single  line  of  code.  As  there  is  insufficient  memory 
to  store  the  function  values  of  all  Nb  =100,000,000  functions,  that  leads  to  the  memory 
issues  as  indicated  by  “mem”  in  Table  8. 


t\Nb 

too 

1,000 

10,000 

100,000 

1,000,000 

10,000,000 

100,000,000 

lO”1 

0.83  (7) 

0.57  (7) 

0.66  (7) 

1.6  (7) 

11.4  (7) 

107.6  (7) 

mem 

icr2 

0.84  (10) 

0.77  (10) 

0.98  (10) 

2.2  (10) 

15.8  (10) 

149.0  (10) 

mem 

icr3 

* 

2.2  (13) 

1.5  (12) 

3.9  (12) 

22.4  (12) 

207.1  (12) 

mem 

icr4 

* 

* 

4.5  (15) 

9.1  (15) 

36.7  (14) 

334.7  (14) 

mem 

icr5 

* 

* 

* 

** 

** 

** 

mem 

Table  8.  Run  times  (in  seconds)  for  SProbA  using  Algorithm  III.  1  with  e-PPP.  The 
numbers  in  parentheses  indicate  the  number  of  iterations.  An  asterisk  *  indicates 
that  the  particular  discretization  parameter  is  insufficient  to  achieve  the  desired  error 
tolerance,  while  a  double  asterisk  **  indicates  that  (III.  147)  is  not  satisfied  after  six 
hours.  The  word  “mem”  means  that  the  algorithm  terminates  due  to  insufficient 
memory. 


Table  9  summarizes  the  run  times  of  Algorithm  III.  1  with  SQP-2QP.  The  faster 
run  times  in  Table  9  compared  to  Table  8  are  due  to  the  superlinear  rate  of  convergence 
of  the  SQP-2QP  algorithm  map  compared  to  the  linear  rate  of  convergence  of  e-PPP. 

We  see  from  Tables  8  and  9,  as  well  as  subsequent  run  times  for  SProbB  and 
SProbC  that  the  discretization  parameter  Nb  that  produces  the  fastest  run  times, 
varies  between  problems  and  tolerances.  Thus,  it  is  difficult  to  determine  the  “right” 
discretization  parameter  to  use. 

Table  10  summarizes  the  run  times  of  Algorithm  III. 3  for  SProbA.  Comparing 
the  run  times  for  the  three  algorithms  (ignoring  the  issue  that  discretization  param¬ 
eters  are  difficult  to  determine),  we  see  that  the  discretization  algorithms  (Tables  8 
and  9)  are  generally  competitive  against  the  e-subgradient  algorithm  (Table  10)  for 
problems  with  uncertainty  dimension  of  one. 
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t\Nb 

100 

1,000 

10,000 

100,000 

1,000,000 

10,000,000 

100,000,000 

10"1 

0.12  (5) 

0.14  (5) 

0.11  (5) 

0.51  (5) 

4.4  (5) 

42.6  (5) 

mem 

10-2 

0.17  (6) 

0.18  (6) 

0.12  (6) 

0.59  (6) 

5.1  (6) 

48.8  (6) 

mem 

10"3 

* 

0.15  (7) 

0.13  (7) 

0.65  (7) 

5.7  (7) 

55.1  (7) 

mem 

10-4 

* 

* 

0.13  (8) 

0.81  (8) 

6.4  (8) 

61.4  (8) 

mem 

10-5 

* 

* 

* 

* 

* 

* 

mem 

Table  9.  Run  times  (in  seconds)  for  SProbA  using  Algorithm  III.  1  with  SQP-2QP. 
The  numbers  in  parentheses  indicate  the  number  of  iterations.  An  asterisk  *  indicates 
that  the  particular  discretization  parameter  is  insufficient  to  achieve  the  desired  error 
tolerance.  The  word  “mem”  means  that  the  algorithm  terminates  due  to  insufficient 
memory. 


t 

KT1 

10"2 

10"3 

10'4  10~5 

Run  times 

0.18  (2) 

0.25  (3) 

0.36  (4) 

0.34  (5)  0.45  (6) 

Table  10.  Run  times  (in  seconds)  for  SProbA  using  Algorithm  III. 3.  The  numbers  in 
parentheses  indicate  the  number  of  iterations. 

2.  Problem  Instance  of  Uncertainty  Dimension  Two 

Tables  11-13  summarize  the  run  times  for  SProbB.  The  discretization  param¬ 
eter  Nb  is  chosen  such  that  N^m  e  N.  The  run  times  for  the  two  discretization 
algorithms  are  generally  an  order  of  magnitude  slower  than  those  for  SProbA,  while 
the  run  times  for  Algorithm  III. 3  are  still  within  the  same  order  of  magnitude.  These 
results  provide  some  validation  to  the  \)~xlmv  rate  of  decay  of  error  bound  obtained 
for  the  two  discretization  algorithms  in  Theorems  III. 9  and  III.  11,  and  the  U1//2  rate 
for  Algorithm  III. 3  in  Theorem  III. 29. 

3.  Problem  Instance  of  Uncertainty  Dimension  Three 

Tables  14-16  summarize  the  run  times  for  SProbC.  We  see  more  evidence 
of  the  independence  of  the  rate  of  decay  of  error  bound  on  m  for  Algorithm  III. 3. 
Specifically,  the  ratio  of  run  times  for  Algorithm  III. 3  to  attain  error  tolerances  of 
1CT1  and  10-5  are  0.45/0.18  =  2.5  (SProbA  where  m  =  1),  1.1/0.21  =  5.2  (SProbB 
where  m  =  2),  and  4. 0/1. 6  =  2.5  (SProbC  where  m  =  3).  These  ratios  provide  some 
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t\Nb 

1,024 

10,000 

100,489 

1,000,000 

10,004,569 

100,000,000 

1,000,014,129 

10”1 

2.5  (5) 

1.5  (5) 

4.9  (5) 

29.6  (5) 

281.9  (5) 

2720  (5) 

** 

10~2 

* 

2.0  (6) 

6.2  (7) 

39.6  (7) 

380.8  (7) 

3605  (7) 

** 

to-3 

* 

* 

* 

49.9  (9) 

535.1  (9) 

4720  (9) 

** 

to-4 

* 

* 

* 

49.4  (9) 

670.1  (10) 

6899  (11) 

** 

to-5 

* 

* 

* 

* 

* 

** 

** 

Table  11.  Run  times  (in  seconds)  for  SProbB  using  Algorithm  III.  1  with  e-PPP.  The 
numbers  in  parentheses  indicate  the  number  of  iterations.  An  asterisk  *  indicates 
that  the  particular  discretization  parameter  is  insufficient  to  achieve  the  desired  error 
tolerance,  while  a  double  asterisk  **  indicates  that  (III.  147)  is  not  satisfied  after  six 
hours. 


t\Nb 

1,024 

10,000 

100,489 

1,000,000 

10,004,569 

100,000,000 

1,000,014,129 

10”1 

0.72  (4) 

0.46  (4) 

2.3  (5) 

12.4  (4) 

134.4  (5) 

1146  (4) 

10879  (5) 

10"2 

* 

0.53  (5) 

2.5  (6) 

14.4  (5) 

155.2  (6) 

1346  (5) 

12475  (6) 

10"3 

* 

* 

* 

16.7  (6) 

175.6  (7) 

1532  (6) 

13881  (7) 

10-4 

* 

* 

* 

* 

* 

1719  (7) 

15178  (8) 

10"5 

* 

* 

* 

* 

* 

* 

* 

Table  12.  Run  times  (in  seconds)  for  SProbB  using  Algorithm  III.  1  with  SQP-2QP. 
The  numbers  in  parentheses  indicate  the  number  of  iterations.  An  asterisk  *  indicates 
that  the  particular  discretization  parameter  is  insufficient  to  achieve  the  desired  error 
tolerance. 


t 

KT1  10~2  10"a  10~4  10"5 

Run  times 

0.21  (5)  0.35  (7)  0.40  (8)  0.71  (9)  1.1  (9) 

Table  13.  Run  times  (in  seconds)  for  SProbB  using  Algorithm  III. 3.  The  numbers  in 
parentheses  indicate  the  number  of  iterations. 
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validation  to  Theorem  III. 29,  which  states  that  the  asymptotic  rate  of  decay  of  error 
bound  for  Algorithm  III. 3  is  6-1/2,  which  is  independent  of  m. 

For  discretization  algorithms,  we  see  that  the  increase  is  strongly  dependent 
on  m.  For  Algorithm  III.  1  with  e-PPP,  the  ratio  of  run  times  to  attain  error  toler¬ 
ances  of  10”1  and  10-4  are  4.5/0.57  =  7.9  (SProbA),  49.4/1.5  =  32.9  (SProbB),  and 
>  21,600/1.4  =  15,000  (SProbC).  For  Algorithm  III.  1  with  SQP-2QP,  the  ratio  of 
run  times  to  attain  error  tolerances  of  ICC1  and  10“4  are  0.13/0.11  =  1.2  (SProbA), 
1,719/0.46  =  3,737  (SProbB),  and  >  21,600/0.81  =  26,667  (SProbC).  These  ob¬ 
servations  indicate  that  the  additional  computational  work  to  achieve  smaller  errors 
increases  as  m  increases,  which  again  provides  validation  to  Theorems  III. 9  and  III. 11, 
which  states  that  the  asymptotic  rate  of  decay  of  error  bounds  for  Algorithm  III.  1 
with  e-PPP  and  SQP-2QP  are 

Theorems  III. 9  and  III. 11  state  that  the  error  bounds  for  the  discretization 
algorithms  with  a  quadratically  and  linearly  convergent  algorithm  map  decay  at  the 
same  asymptotic  rate  of  as  b  — >  oo.  We  observe  generally  faster  run  times  for 

Algorithm  III.  1  with  SQP-2QP  (superlinear)  as  compared  to  e-PPP  (linear),  which 
shows  that  we  are  not  in  asymptotic  regime  yet. 


t\Nb 

1,000 

10,648 

103,823 

1,000,000 

10,077,696 

100,544,625 

1,000,000,000 

lO”1 

1.4  (5) 

3.0  (4) 

13.3  (4) 

74.4  (4) 

491.3  (4) 

3706  (4) 

** 

icr2 

* 

* 

20.8  (7) 

113.0  (7) 

860.5  (8) 

6017  (7) 

** 

lcr3 

* 

* 

* 

** 

2393  (13) 

10546  (10) 

** 

icr4 

* 

* 

* 

** 

** 

** 

** 

icr5 

* 

* 

* 

** 

** 

** 

** 

Table  14.  Run  times  (in  seconds)  for  SProbC  using  Algorithm  III.  1  with  e-PPP.  The 
numbers  in  parentheses  indicate  the  number  of  iterations.  An  asterisk  *  indicates 
that  the  particular  discretization  parameter  is  insufficient  to  achieve  the  desired  error 
tolerance,  while  a  double  asterisk  **  indicates  that  (III.  147)  is  not  satisfied  after  six 
hours. 


104 


t\Nb 

1,000 

10,648 

103,823 

1,000,000 

10,077,696 

100,544,625 

1,000,000,000 

lO"1 

0.81  (5) 

2.2  (5) 

11.3  (5) 

59.6  (5) 

429.9  (5) 

4871  (5) 

** 

icr2 

* 

* 

12.9  (6) 

69.5  (6) 

570.7  (6) 

5658  (6) 

** 

10~a 

* 

* 

* 

* 

665.7  (8) 

7052  (8) 

** 

icr4 

* 

* 

* 

* 

* 

* 

** 

icr5 

* 

* 

* 

* 

* 

* 

** 

Table  15.  Run  times  (in  seconds)  for  SProbC  using  Algorithm  III.  1  with  SQP-2QP. 
The  numbers  in  parentheses  indicate  the  number  of  iterations.  An  asterisk  *  indicates 
that  the  particular  discretization  parameter  is  insufficient  to  achieve  the  desired  error 
tolerance,  while  a  double  asterisk  **  indicates  that  (111.147)  is  not  satisfied  after  six 
hours. 


t 

KT1 

o 

1 

to 

O 

1 

CO 

O 

1 

O 

1 

Cn 

Run  times 

1.6  (13) 

1.8  (21)  2.7  (29)  3.9  (37)  4.0  (44) 

Table  16.  Run  times  (in  seconds)  for  SProbC  using  Algorithm  111.3.  The  numbers  in 
parentheses  indicate  the  number  of  iterations. 

E.  CONCLUSIONS  FOR  SEMI-INFINITE  MINIMAX 

This  chapter  focuses  on  the  discretization  approach  to  solve  unconstrained 
semi-infinite  minimax  problems.  We  develop  and  compare  rate-of-convergence  results 
for  various  fixed  and  adaptive  discretization  algorithms,  as  well  as  an  e-subgradient 
algorithm.  We  present  a  novel  way  of  expressing  rate  of  convergence,  in  terms  of 
computational  work  instead  of  the  typical  number  of  iterations.  We  show  that  a  fixed 
discretization  algorithm  can  achieve  the  same  asymptotic  convergence  rate  attained 
by  an  adaptive  discretization  algorithm.  We  also  show  that  under  certain  convexity- 
concavity  assumptions,  the  rates  of  convergence  for  discretization  algorithms  depend 
on  the  uncertainty  dimension,  while  the  rate  of  convergence  for  an  e-subgradient 
algorithm  is  independent  of  the  uncertainty  dimension.  This  indicates  that  under 
convexity-concavity  assumptions,  discretization  algorithms  are  not  likely  to  be  com¬ 
petitive  with  e-subgradient  algorithms  for  problems  with  large  uncertainty  dimension. 
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Numerical  results  show  that  for  convex-concave  problems,  discretization  algorithms 
are  not  competitive  with  e-subgradient  algorithms  for  problems  with  uncertainty  di¬ 
mension  as  small  as  two. 
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IV.  SEMI-INFINITE  MIN-MAX-MIN 

PROBLEM 

A.  INTRODUCTION 

In  this  chapter,  we  consider  a  generalized  semi-finite  min-max-min  problem  of 
the  form 

(GMXM)  rnin^(x),  (IV.l) 

x&X 

where  ip  :  — >  M  is  defined  by 

ip(x)  =  max  min  (f>(x,y,z),  (IV. 2) 

y&  z£Z(x,y) 

X  C  and  Y  c  Mm  are  compact  sets,  the  set-valued  function  Z  :  R.d  x  — *  2RS 

is  continuous  (see  Section  5.3  of  Polak,  1997  for  a  definition  on  the  continuity  of  a 
set-valued  function)  as  well  as  compact-  and  nonempty-valued  on  X  x  Y,  and  for  all 
(x,y)  G  X  x  Y  and  z  G  Z(x,y),  </>(•,  •,  •)  is  continuous  at  ( x,y,z ).  In  particular,  we 
focus  on  the  special  case  where  Z(-,  •)  is  a  constant  set  Z  C  Ms,  but  also  deal  with  the 
generalized  Z(«,  •)  case.  Throughout  the  chapter,  we  refer  to  the  case  with  constant 
set  Z  as  the  constant  Z  case,  and  the  case  of  the  set-valued  function  Z(-,-)  as  the 
variable  Z  case.  We  denote  the  semi-infinite  min-max-min  problem  for  the  constant 
Z  case  by  (SMXM).  Also,  we  refer  to  (SMXM)  and  (GMXM)  collectively  as  (MXM) 
for  brevity. 

Applications  involving  min-max-min  optimization  include  floorplan  sizing  in 
electronic  circuit  boards  (Chen  &  Fan,  1998),  obstacle  avoidance  for  robots  (Kirjner- 
Neto  &  Polak,  1998),  optimal  design  centering,  tolerancing  and  tuning  problem  (Tits, 
1985),  geometric  facility  location  problem  (Cardinal  &  Langerman,  2006),  and  net¬ 
work  interdiction  problem  (Martin,  2007),  of  which  we  will  give  an  example. 

The  problem  (MXM)  is  difficult  to  solve  due  to  the  layers  of  min  and  max 
operators,  and,  as  shown  in  Ralph  and  Polak  (2000),  'ip(-)  may  not  have  directional 
derivatives  even  when  </>(•,  •,  •)  is  smooth.  This  implies  that  defining  suitable  optimal- 
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ity  conditions  is  difficult.  These  difficulties  have  resulted  in  a  rather  limited  literature 
on  (SMXM),  and  so  far,  there  is  no  solution  approach  for  (GMXM). 

Ralph  and  Polak  (2000)  propose  an  approach  that  deals  with  (SMXM),  mainly 
with  X  =  Wl.  The  assumptions  are  (i)  A",  Y,  and  Z  are  compact,  and  (ii)  for  all 
(y,z)  G  Y  x  Z,  <j)(-,y,z )  is  continuously  differentiable  on  X.  The  authors  first  dis¬ 
cretize  Y  and  Z  into  Yjv  and  Zm,  where  N,  M  e  N  are  the  cardinality  of  Yn  and 
Zm-,  respectively.  Then  a  master  algorithm  is  used  to  solve  sequences  of  discretized 
min-max-min  problems  of  increasing  level  of  discretization.  The  finite  min-max-min 
algorithm  map  used  to  solve  the  discretized  problems  (within  the  master  algorithm) 
applies  a  method  that  combines  an  Armijo- type  line  search  and  a  trust  region  ap¬ 
proach.  The  authors  then  discuss  how  an  exact  penalization  method  can  be  used 
to  eliminate  constraints  defining  X ,  if  any  are  present.  The  main  challenge  in  the 
approach  is,  in  each  iteration  of  the  algorithm  map,  we  need  to  solve  MN  linear  pro¬ 
grams  to  determine  the  search  direction.  As  noted  in  the  paper,  this  is  expected  to 
be  a  highly  computationally  intensive  task.  There  are  no  numerical  results  in  this 
paper. 

In  Ralph  and  Polak  (2000),  we  find  another  approach  for  (SMXM)  with 
X  =  where  the  same  assumptions  and  initial  discretization  step  in  Ralph  and 
Polak  (2000)  are  used.  The  author  then  applies  exponential  smoothing  (as  described 
in  Chapter  II)  to  the  innermost  minimization  problem  to  obtain  a  finite  minimax 
problem.  The  algorithm  proposed  also  consists  of  a  master  algorithm  that  solves  se¬ 
quences  of  the  finite  minimax  problems  with  increasing  level  of  discretization,  using 
the  Pshenichnyi-Pironneau-Polak  (PPP)  minimax  algorithm  map.  The  main  chal¬ 
lenge  in  this  algorithm  is,  as  the  level  of  discretization  increases,  there  are  more 
functions  in  ZM.  In  order  to  keep  the  smoothing  error  small,  the  smoothing  pa¬ 
rameter  needs  to  increase,  which  may  lead  to  ill-conditioning.  Again,  there  are  no 
numerical  results  in  this  paper  to  provide  any  hint  on  the  numerical  performance  of 
the  algorithm. 
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This  chapter  proposes  a  novel  approach  to  solve  (MXM).  We  assume  that  (i) 
X  and  Z  are  compact  and  convex,  and  (ii)  for  all  y  G  Y,  <f>(-,y,-)  is  continuously 
differentiable  and  convex  on  X  x  Z.  We  discretize  Y  into  Y/v  to  obtain  a  discretized 
min-max-min  problem,  which  we  then  reformulate  into  a  discretized  min-min-max 
problem  of  larger  dimensionality.  Finally,  we  observe  that  the  discretized  min-min- 
max  problem  can  be  interpreted  as  a  constrained  finite  minimax  problem.  Under 
our  convexity  assumptions,  we  show  that  for  any  N  G  N,  if  we  solve  the  constrained 
finite  minimax  problem,  we  obtain  a  global  minimizer  of  the  discretized  min-max- 
min  problem.  And  if  the  level  of  discretization  N  increases  to  infinity,  the  points 
constructed  approach  the  global  minimizer  of  (MXM).  The  algorithms  in  Ralph  and 
Polak  (2000)  and  Polak  (2003)  do  not  guarantee  convergence  to  a  global  minimizer 
even  under  our  convexity  assumptions.  The  main  challenge  in  our  approach  is  the 
size  of  the  constrained  finite  minimax  problem  constructed,  which  has  N  functions 
and  d  +  Ns  variables,  where  d  and  s  are  the  dimensionality  of  X  and  Z,  respectively. 

We  find  similar  conversion  from  a  min-max-min  problem  to  a  min-min-max 
problem  in  Martin  (2007)  for  the  case  with  binary  variables  in  the  outer  min- max, 
where  the  min-min-max  problem  provides  a  lower  bound  on  the  optimal  objective- 
function  value.  Another  possible  way  of  converting  a  min-max-min  problem  into  a 
min-min-max  problem  is  to  use  von  Neuman’s  minimax  theorem  (see  for  example, 
Theorem  5.5.5  of  Polak,  1997),  but  this  requires  that  the  sets  Y  and  Z  be  compact, 
convex,  and  constant,  and  for  all  x  G  X  and  z  G  Z,  cp(x,  ■ ,z )  is  concave  on  Y,  and 
for  all  x  G  X  and  y  G  Y,  cp(x,  y,  •)  is  convex  on  Z. 

The  next  section  shows  that  (MXM)  arises  in  network  interdiction  problems. 
Section  C  outlines  a  new  approach  to  solve  (MXM).  We  obtain  some  numerical  results 
in  Section  D  by  applying  the  approach  on  a  network  interdiction  problem.  Section  E 
concludes  the  chapter. 
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B.  DEFENDER- ATTACKER-DEFENDER  EXAMPLE 

In  this  section,  we  describe  a  defender-attacker-defender  (DAD)  network  in¬ 
terdiction  problem.  We  provide  two  different  formulations  for  the  problem,  one  in 
the  form  of  (SMXM)  and  another  in  the  form  of  (GMXM). 

We  consider  a  network  G  with  node  set  V  and  arc  set  E.  A  node  u  €  V  can 
provide  nonnegative  supply  up  to  a  maximum  of  ubsupplyu  at  the  cost  of  costsupplyu 
per  unit  of  supply.  Node  u  requires  also  a  given  demandu  of  supplies.  A  defender  first 
decides  how  much  supply  to  place  at  node  u,  which  we  denote  by  SUPPLYU. 

Second,  an  attacker  decides  on  the  quantity  of  sorties,  SORTIEUjV  to  attack 
each  arc  (u,  v)  €  E,  subject  to  a  maximum  of  totalsorties,  with  the  intent  to  maximize 
the  defender’s  cost  to  be  defined  later.  We  use  an  exponential  damage  function;  see 
for  example  Nugent  (1969);  Capps  (1970),  to  model  the  capacity  reduction  of  an  arc 
that  is  attacked.  We  consider  SORTIEU)V  as  a  continuous  variable  as  we  assume  that 
aircraft  carry  bomb  loads  that  can  be  distributed  in  any  way  over  several  arcs.  Note 
that  our  proposed  approach  can  handle  integer  restrictions  on  the  decision  variables 
for  the  maximization  in  (MXM),  which  is  S()R,TIEU  V  in  the  DAD  problem. 

Third,  the  defender  sends  flow  of  supplies  between  nodes  in  an  attempt  to 
meet  demand.  The  parameters  lbflouiUtV  and  ubflowU)V  represent  the  lower  and  upper 
bounds  on  the  flow  across  arc  (u,v)  before  the  attack,  and  vulcapUtV  represents  the 
amount  of  capacity  vulnerable  to  attack  for  arc  (u,  v ).  Based  on  Nugent  (1969);  Capps 
(1970),  the  remaining  capacity  of  arc  (u,v)  after  the  attack  is: 

ubflowu  v  —  vulcapu  v  [1  —  exp(— vulUtVSORTIEU)V)} ,  (IV. 3) 

where  vulU)V  represents  the  vulnerability  of  arc  (u,v).  A  larger  value  of  vulu^v  repre¬ 
sents  that  the  arc  is  more  vulnerable  to  attacks.  The  vulnerability  parameter  vulUtV 
indicates  the  efficiency  of  a  sortie  against  arc  (u,v). 

The  objective  function  in  the  problem  is  the  sum  of  (i)  the  cost  to  place  supply 
at  nodes  and  (ii)  the  cost  to  send  flow  through  the  network  after  the  attack  to  satisfy 
demands.  We  model  the  nonlinear  effects  of  congestion  on  the  cost  of  sending  flow 
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(see  for  example  p.  651  of  Ahuja,  Magnanti,  &  Orlin,  1993)  across  arc  (u,v)  by 

costflowuvFLOWU)V 
ubflowuv  —  FLOWU)V  +  e’ 

where  e  >  0  is  a  small  number  to  ensure  that  the  cost  of  flow  remains  bounded  as 
FLOWUjV  approaches  ubflowU)V. 

In  this  problem,  we  assume  perfect  information,  i.e.,  both  the  defender  and 
attacker  know  the  full  characteristics  of  the  network  in  terms  of  bounds  on  the  flow 
on  each  arc,  the  vulnerability  of  each  arc,  etc.  We  also  assume  that  the  defender 
knows  the  maximum  sorties  that  the  attacker  can  launch,  the  attacker  knows  where 
the  supplies  are  placed  before  launching  the  sorties,  and  finally,  the  defender  knows 
the  remaining  capacity  of  all  the  arcs  in  the  network  before  sending  flow  to  satisfy 
the  demands. 

We  provide  the  formulation  in  both  forms  of  (SMXM)  and  (GMXM)  next,  with 
detailed  explanation  following  the  model  descriptions.  In  (SMXM),  the  feasible  region 
of  the  inner  minimization  problem  must  be  independent  of  the  decision  variables  for 
the  outer  minimization  and  the  maximization  parts.  Hence,  the  capacity  and  balance 
of  flow  constraints  are  accounted  for,  by  using  penalty  terms  in  the  objective  function. 
In  the  case  of  (GMXM),  capacity  and  balance  of  flow  are  imposed  as  constraints. 
Indices 


ueV 
{u,v)  G  E 

Data 


costsupplyu 

costflowUtV 

demandu 

e 

lbflowUfV 

penbal 

pencap 

totalsorties 


node  (alias  v) 

arc  directed  from  node  u  to  node  v 


cost  to  place  supply  at  node  u 

cost  coefficient  for  flow  between  nodes  u  and  v 

demand  at  node  u 

a  small  number  that  ensures  bounded  flow  cost 
nonnegative  lower  bound  on  flow  on  arc  (u,  v ) 
penalty  parameter  for  violation  of  flow  balance  constraints 
penalty  parameter  for  violation  of  capacity  constraints 
total  number  of  attacker  sorties  the  attacker  can  fly 
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ubflowU)V 

ubsupplyu 

'VV/lu'V 

vulcapu>v 

Decision  Variables 


upper  bound  on  flow  on  arc  (u,  v) 
upper  bound  on  supplies  at  node  u 
vulnerability  of  arc  ( u ,  v) 

amount  of  capacity  of  arc  ( u ,  v )  vulnerable  to  attack 


EXCESSSUPPLYU 
EXTRASUPPLY,, 
FLOWu,v 
S()RTIEUV 
SUPPLY i 


recourse  supply  removed  from  node  u 
recourse  supply  placed  at  node  u 
flow  from  node  u  to  node  v 
number  of  sorties  to  attack  arc  ( u ,  v ) 
amount  of  supply  to  be  placed  at  node  u 


We  denote  the  vector  that  contains  all  the  components  FLOWUtV,  (u,  v)  G  E  by 
FLOW,  and  similarly  for  EXCESSSUPPLY,  EXTRASUPPLY ,  SORTIE ,  and  SUP¬ 
PLY. 


DAD  Formulation  of  the  form  (SMXM):  constant  Z  case 


min  >  costsupplyn.SUPPLYu  +  > 

FLOW  ^ 


costflowu  vFLOWu,v 


uev 


-E  penbalu 

u£V 


ubflowuv  —  FLOW  u,v  +  e 
2 


(u,v)(E.E 

X  flowu,v  -  x  ^ow„, 

v:(u,v)£E  v:(v,u)£E 

—  SUPPLY  u  +  demandu 


max  < 
SORTIE 


mm  < 
SUPPLY 


j 

r  °>  l 

+  E  PencaPu,v 

max  < 

FLOWu.v  —  ubflowU'V+  . 

(u,v)£E 

! 

{  vulcapu  v  [1  —  exp(— vulu,vSORTIEUtV)]  J 

s.t.  lbflowu  v  <  FLOW u,v 

<  ubflowu  v  V(u,v)  S  i? 

s.t.  SORTIEu,v  <  totalsorties 

(u,v)(£E 

SORTIEu,v  >  0  W(u,v)  G  E 


s.t.  0  <  SUPPLY u  <  ubsupplyu  \/u  G  V 
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DAD  Formulation  of  the  form  (GMXM):  variable  Z  case 


min  < 
SUPPLY  I 


max  < 
SORTIE 


costflow ..  ..FLOW u>v 

costsupply..  SUPPLY  u  +  - 1 - 

,  ,  ubflow „  7J  -  FLOWu  V  +  c 

EXTRASUPPLY  u£V  ( u,v)eE  J  u'v  ’ 

+  ^2  Penbalu EXCESSS UPPLY u  +  ^  penbaluEXTRASUPPLYu 

u£V  u£V 

s.t.  X  FLOWu, V  ~  X  FLOWVlu  =  SUPPLY u  -  demandu 

v:(u,v)£E  v:(v,u)£E 

-  EXCESSSUPPLYU  +  EXTRASUPPLY u  Yu  e  V 


lbflowu  v  <  FLOWu, v 

<  ubflowu  v  —  vulcapu  v  [1  —  exp (—vulu,vSORTIEu,v)\  V(u,v)  £  E 
EXCESSSUPPLYU  >0  Yu  £  V 
EXTRASUPPLY u  >0  Yu  £  V 
s.t.  N  SORTIEUiV  <  totalsorties 

(u,v)£E 


SORTIEu,v  >  0  W(u,v)  G  E 


s.t.  0  <  SUPPLY u  <  ubsupplyu  \/u  G  V7 


Discussion 

The  main  differences  between  the  two  formulations  lie  in  the  way  that  the 
capacity  and  balance  of  flow  constraints  are  modeled.  In  the  variable  Z  case,  we  model 
them  explicitly.  The  dummy  variables  EXCESSSUPPLY  and  EXTRASUPPLY  are 
included  to  ensure  that  the  model  remains  feasible  even  if  the  sorties  have  reduced 
the  capacity  of  the  network  to  a  level  where  (i)  excess  supply  cannot  flow  out  from  a 
node  or  (ii)  demand  at  a  node  is  not  fully  satisfied.  And  if  that  happens,  we  penalize 
the  violation  based  on  (i)  the  excess  supply  that  needs  to  be  removed  or  (ii)  the  extra 
supply  required  to  satisfy  demand  fully. 

For  the  constant  Z  formulation,  we  model  the  capacity  and  balance  of  flow 
constraints  by  including  penalty  cost  terms  in  the  objective  function.  This  allows 
us  to  have  the  inner  constraint  set  being  a  constant  set  defined  by  0  <  lbflowu  v  < 
FLOWU)V  <  ubflow uv  for  all  arcs  (u,v)  G  E. 

The  objective  functions  for  both  cases  express  the  sum  of  (i)  the  cost  of  placing 
supply  at  supply  nodes,  (ii)  cost  of  sending  flow  through  the  network  after  the  attack 
to  satisfy  demands,  considering  congestion  effects,  and  (iii)  penalty  terms  for  violation 
of  capacity  or  balance  of  flow  constraints.  The  other  constraints  are  self-explanatory. 

We  note  that  if  we  view  SORTIEu  v  as  constants  instead  of  decision  variables, 
the  objective  functions  and  feasible  sets  in  both  formulations  are  convex,  and  the 
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feasible  region  is  defined  by  linear  functions  and  box  constraints.  We  will  revisit  this 
issue  when  we  discuss  our  new  approach  to  solve  (MXM). 

C.  APPROACH  TO  SOLVE  THE  MIN-MAX-MIN  PROB¬ 
LEM 

In  this  section,  we  propose  an  approach  to  solve  (MXM)  by  constructing  a 
constrained  finite  minimax  problem.  We  use  this  finite  minimax  problem  to  obtain 
an  approximation  to  a  global  minimizer  of  (MXM)  under  certain  assumptions  on  the 
algorithm  used  to  solve  the  finite  minimax  problem. 

Before  we  describe  the  approach  to  solve  (MXM),  we  state  two  assumptions 
for  Z  that  will  be  used  repeatedly  throughout  the  chapter.  Assumption  IV.  1  will  be 
used  when  we  consider  the  constant  Z  case,  while  Assumption  IV.2  will  be  used  when 
we  consider  the  variable  Z  case. 

Assumption  IV. 1.  X  C  Y  C  and  Z  cls  are  compact  sets,  and  (/>(•,  •,  •)  is 
continuous  on  X  xY  x  Z . 

Assumption  IV.2.  X  C  and  Y  C  Mm  are  compact  sets,  the  set-valued  function 
Z  :  Rd  x  Mm  — >  2r“  is  continuous  as  well  as  compact-  and  nonempty-valued  on  X  x  Y , 
and  for  all  (x,y)  G  X  x  Y  and  z  G  Z(x,y),  </>(•,•,•)  is  continuous  at  ( x,y,z ). 

1.  Constructing  a  Finite  Minimax  Problem 

In  this  subsection,  we  construct  a  finite  minimax  problem  from  (MXM).  We 
first  discretize  the  set  Y  C  Mm  to  obtain  a  discretized  min-max-min  problem.  Next, 
we  show  that  the  inner  max-min  problem  is  equivalent  to  a  min-max  problem.  We 
then  observe  that  the  min-min-max  problem  can  be  interpreted  as  a  finite  minimax 
problem,  but  with  more  variables  than  (MXM).  We  cover  the  details  next. 

a.  Discretized  Min-Max-Min  Problem 

We  introduce  the  discretized  problem  for  (GMXM): 

(GMXMW)  min  (x) ,  (IV.5) 
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where  ?Av  :  lU  — >■  M,  V  G  N  is  defined  by 


if  n{x)  =  max  min  f>(x,y,z),  (IV. 6) 

y£YN  z£Z(x,y) 

Yn  c  V,  \Yn\  =  N  G  N,  satisfy  the  property  dist (Yiv,y)  — *  0  as  iV  — *  oo,  with 
dist ( • ,  •)  being  the  Hausdorff  distance  operator  defined  on  p.  65.  Under  Assumption 
IV. 2,  (GMXM^v)  is  well-defined.  An  example  of  a  discretization  scheme  that  produces 
Yn  with  the  above  property  is  the  uniform  grid  discussed  on  p.  65  for  the  case  when 
Y  is  a  hyper-box.  We  denote  the  equivalent  discretized  problem  for  (SMXM)  by 
(SMXMat). 

We  need  the  following  notation  for  subsequent  analysis  of  (GMXM): 


u(x,y)=  min  6(x,y,z), 

z£Z(x,y) 

(IV.7) 

Z(x,  y)=  {z  E  Z(x,  y )  |  f(x,  y,  z)  =  u(x,  y)}  , 

(IV. 8) 

Y{x)  =  {ye  Y  |  u{x,y)  =  if{x)}  , 

(IV. 9) 

Yn(x)  =  {y  e  Yn  |  u(x,y)  =  ifN{x)}  ■ 

(IV.10) 

We  next  show  the  continuity  of  the  functions  uf,  •),  iff),  and  iPn(-)  for 
both  the  constant  and  variable  Z  case. 

Proposition  IV. 3.  Suppose  that  Assumption  IV.  1  holds.  Then  the  functions  ojf,  •),. 
iff),  and  ifj\ r(-),N  G  N  are  continuous  on  X  x  Y  and  X,  respectively. 

Proof.  We  refer  to  Polak  (1997,  Section  5.4)  for  the  proof.  □ 

Proposition  IV. 4.  Suppose  that  Assumption  IV. 2  holds.  Then  the  functions  to(-,  •), 
and  iPn(-),N  G  N  are  continuous  on  X  x  Y  and  X,  respectively. 

Proof.  Based  on  Corollary  5.4.2  of  Polak  (1997),  cu(-,  •)  is  continuous  on  X  x  Y. 
Hence,  based  on  Propositions  1.1  and  1.2,  iff)  and  iPn(-)  are  also  continuous  on 
X.  □ 
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We  next  show  that  the  discretized  min-max-min  problems  epi-converge 
to  (MXM).  We  first  consider  the  constant  Z  case. 

Lemma  IV. 5.  Suppose  that  Assumption  IV.  1  holds.  The  sequence  of  problems 
{(SMXMA^jjveN  epi-converges  to  (SMXM)  as  N  — y  oo. 

Proof.  The  proof  follows  the  same  arguments  as  the  epi-convergence  proof  in  Theo¬ 
rem  5.2  of  Polak  (2003)  and  is  included  here  for  completeness.  Suppose  that  {xatItvsn 
is  a  sequence  in  X  such  that  — *  x  as  N  — *  oo,  and  suppose  that  y^  e  Yn(xn )  for 
each  N  e  N.  Without  loss  of  generality,  we  assume  that  — >  y  as  N  — >  oo.  Then 

lirnsup iPn(xn)  —  lim  uj(xn,  yN)  —  w(x,  y)  <  ip(x),  (IV.ll) 

N^oo  N^-oo 

where  we  use  the  fact  that  u(-,  •)  is  continuous;  see  Proposition  IV.3. 

Next,  suppose  that  i^(x)  =  co(x,y*)  for  some  y*  e  YN(x).  Then  since 
dist(Vv,  Y)  — >  0  as  N  — >  oo,  there  exists  a  y'N  G  Vv  such  that  y'N  — >•  y* .  Hence, 

lirninf  iI>n{xn)  >  lim  u(xN,  y'N)  —  u(x,  y*)  —  if{x).  (IV.12) 

N— >oo  N— »oo 

This  proves  that  if  xjy  — >  x  as  N  — >  oo,  then  i/jn(x n)  — >■  i>(x).  Based  on  Proposition 
1.3,  the  conclusion  follows.  □ 

Lemma  IV. 6.  Suppose  that  Assumption  IV. 2  holds.  The  sequence  of  problems 
{(GMXMtv)} jvsn  epi-converges  to  (GMXM)  as  N  — y  oo. 

Proof.  The  proof  follows  the  same  arguments  as  the  proof  for  Lemma  IV.5,  with 
Proposition  IV.4  replacing  Proposition  IV.3.  □ 

We  next  provide  two  theorems,  which  directly  follow  from  the  epi- 
convergence  of  the  discretized  min-max-min  problems  to  (MXM).  Again,  we  first 
consider  the  constant  Z  case  before  the  variable  Z  case. 

Theorem  IV. 7.  Suppose  that  Assumption  IV.  1  holds.  If  {rr at}  is  a  sequence  of 
global  minimizers  of  (SMXM^)  and  there  exists  an  infinite  subset  K  e  N  such  that 
xn  ~^-K  x  as  N  — y  oo,  thenx  is  a  global  minimizer  of  (SMXM),  and  if  n(xn)  ~tK  if(x) 
as  N  — »  oo. 
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Proof.  The  conclusion  follows  from  Lemma  IV.5  and  Proposition  1.4.  □ 

Theorem  IV. 8.  Suppose  that  Assumption  IV. 2  holds.  If  {xn}  is  a  sequence  of  global 
minimizers  o/ (GMXMat)  and  there  exists  an  infinite  subset  K  e  N  such  thatxN  — >K 
x  as  N  — y  oo,  then  x  is  a  global  minimizer  of  (GMXM),  and  'iJjn(xn)  — >K  if(x)  as 
N  — >  oo. 


Proof.  The  conclusion  follows  from  Lemma  IV.6  and  Proposition  1.4.  □ 

Theorem  IV. 7  imply  that  if  we  pick  a  large  JVgN,  and  solve  (SMXM^r) 
to  obtain  a  global  minimizer  x^,  then  x^  is  an  approximation  to  a  global  minimizer  of 
(SMXM).  The  same  is  true  regarding  Theorem  IV. 8.  As  (SMXMat)  and  (GMXMat) 
are  still  difficult  problems  to  solve,  we  reformulate  them  into  finite  minimax  problems 
next. 


b.  Equivalent  Finite  Minimax  Problem 

In  this  subsection,  we  first  introduce  a  discretized  min-min-max  prob¬ 
lem  that  we  show  is  equivalent  to  the  discretized  min-max-min  (GMXMat)  in  some 
sense.  We  then  show  that  the  new  min-min-max  problem  can  be  seen  as  a  finite 
minimax  problem.  To  show  the  equivalence  of  this  new  min-min-max  problem  to 
(GMXMat),  we  introduce  some  notational  changes  to  (GMXMat). 

Without  loss  of  generality,  we  assume  that  Vat  =  {2/1, 2/2,  2/jv},  and 


we  re-express 


^n{x)  —  max  min  pJ(x,z), 

j£Af  z£Z(x,yj) 


(IV.13) 


where  J\f  =  {1,  2, ...,  N},  and  the  function  (pi  :  x  IP  — >  M,  j  G  J\f,  is  defined  by 


A 


pJ(x,z)  =  <f>(x,yj,z). 


(IV.  14) 


We  now  introduce  the  equivalent  problem  to  (GMXMat): 

(GMMXat)  minima;),  (IV.  15) 

xGX 
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where  'ifN:W.d^-'R,N  G  N,  is  defined  by 

ffi v(x)  =  min  max^fx,  z),  (IV.  16) 

zGZn(x)  j&N 

(fp  :  x  — >  M,  j  E  Af,  is  defined  by 

<P3(x,z)  =  (pi(x,zj),  (IV.  17) 

z=  (zf ,  zf,  ••.,  Zn)t,  Zj  E  Z(x,yj )  for  all  j  E  Af,  and  ZN{x)  =  Z(x,yi)  x  Z(x,y2)  x 
...  x  Z(x,un).  In  order  to  allow  for  the  exchange  of  the  min  and  max  operators,  we 
introduce  a  z  variable  for  each  y  E  Vv-  This  expands  the  dimension  of  z  by  a  factor 
of  N .  We  denote  the  equivalent  discretized  min-min-max  problem  for  (SMXM)  by 
(SMMX^v). 

The  next  result  proves  the  equivalence  of  the  new  discretized  min-min- 
max  problem  to  the  discretized  min- max- min  problem.  We  first  consider  the  constant 
Z  case.  We  define  ZN  =  Z  x  Z  x  ...  x  Z . 

Theorem  IV. 9.  Suppose  that  Assumption  IV.  1  holds.  Then  for  all  N  E  N  and 
x  E  X,  ffN(x)  =  tpN{x). 

Proof.  For  all  x  E  X  and  j  E  Af,  since  •)  is  continuous  on  Z  and  Z  is  compact, 
there  exists  a  Zj(x)  E  Z  such  that 

(x,  Zj(x))  =  min  (pJ{x,z).  (IV.  18) 

z&Z 

We  define  z(x)  =  (zi(x)T ,  z2{x)T , ...,  zn{x)t)t .  Then 

iPn(x)  =  ma xtp3 (x,  Zj(x)) 
j£Af 

=  min  max  (p>  (x,  z) 
zezN,z=z(x)  je M 

>  min  max^Yx,  z)  —  ^jv(x).  (IV.  19) 

zeZN  j&M 

Next,  for  all  x  E  X,  since  nuixJgAr  (p3{x,  •)  is  continuous  on  ZN ,  and  ZN  is  compact, 
there  exists  a  z(x)  E  ZN  such  that 

=  maxifVx,  z(x)).  (IV.20) 

jeAT 
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Then 


■^jv(x)  =  max  yd  (x,  Zj(x)) 

j&M 

=  max  min  (pJ(x,z ) 

jiGA/*  z=Zj(x),z£Z 

>  maxmin  yd(x,  z)  —  ^jvfx). 
jeAf  zcz 

(IV.21) 

The  conclusion  follows  since  V’at(x)  >  -0jv(x)  and  iPn(x)  >  ^n(x). 

□ 

We  observe  that  the  min-min-max  problem  (SMMXtv) 

is  a  constrained 

finite  minimax  problem  of  the  form 

(FMX^)  min'FjvH, 

w  ew 

(IV.22) 

where 

^jv(w)  =  ma x/J(w), 
j£Af 

(IV.23) 

f3(w)  =  <p3(x,z), 

(IV.24) 

and  w  =  ( xT ,  zf ,  zjt ...,  z'fr)T  C  W  =  X  x  ZN . 

Note  that  we  obtain  the  simpler  finite  minimax  problem  (FMXjv)  from 


the  discretized  min-max-min  (SMXMat)  at  the  expense  of  a  larger  number  of  variables, 
i.e.,  w  G  Rd+Ns. 

The  results  above  are  next  generalized  to  the  variable  Z  case. 

Theorem  IV.  10.  Suppose  that  Assumption  IV.  2  holds.  Then  for  all  N  G  N  and 
x  G  V,  $N(x)  = 

Proof.  The  proof  follows  the  same  arguments  as  the  proof  for  Theorem  IV. 9,  with 
obvious  notational  changes.  □ 

The  generalized  min-min-max  problem  (GMMX^r)  is  also  a  constrained 
finite  minimax  problem,  with  a  form  similar  to  (FMXjy)  defined  in  (IV.22)-(IV.24), 
except  that  the  set  W  is  replaced  by  Wx  =  { (x,  z)  G  Rd  x  RNs  \  x  G  X,  z  G  ZN(x)}. 
We  denote  the  constrained  finite  minimax  problem  for  (GMXM)  as  (GFMXtv). 

We  next  propose  an  algorithm  that  produces  an  approximation  to  a 
global  minimizer  of  (MXM)  by  solving  the  constructed  finite  minimax  problem. 
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2.  Algorithm  for  Semi-Infinite  Min-Max-Min 

In  this  subsection,  under  the  assumption  that  there  exists  a  constrained  finite 
minimax  algorithm  that  produces  a  global  minimizer  of  (FMXat),  we  propose  an 
algorithm  that  obtains  a  point  that  is  close  to  a  global  minimizer  of  (SMXM).  We 
describe  a  constrained  finite  minimax  algorithm  that  satisfies  the  assumption  in  the 
numerical  section.  In  this  subsection,  we  only  consider  the  constant  Z  case,  however, 
all  the  results  equally  apply  to  the  variable  Z  case. 

From  this  point  on,  we  refer  to  those  algorithms  that  are  applied  to  solve 
(FMXat),  N  G  N,  as  algorithm  maps,  to  differentiate  them  from  the  overall  algo¬ 
rithm  for  (SMXM).  We  develop  the  convergence  results  of  our  approach  based  on  a 
constrained  finite  minimax  algorithm  map  that  satisfies  the  following  assumption. 

Assumption  IV. 11.  Suppose  that  Assumption  IV.  1  holds.  Given  an  N  e  N,  the 
algorithm  map  applied  to  solve  (FMXat)  generates  a  sequence  C  X  x  ZN ,  and 

every  accumulation  point  of  that  sequence  is  a  global  minimizer  of  (FMXat).  □ 

In  view  of  the  above  results,  the  following  algorithm  for  (SMXM)  is  simple. 
Algorithm  IV.  1.  Semi-Infinite  Min-Max-Min  Algorithm 
Parameter:  JVeN. 

Step  1.  Generate  a  sequence  {u!,}fL0  by  applying  a  constrained  finite  minimax 
algorithm  map  that  satisfies  Assumption  IV.  11  to  (FMXat).  □ 

The  next  theorem  implies  that  if  we  choose  a  high  level  of  discretization,  i.e., 
large  N,  then  from  every  accumulation  point  of  the  sequence  generated,  we  can  easily 
construct  a  point  that  is  a  global  minimizer  of  the  discretized  min-max-min  problem 
(SMXMat).  Thus,  if  the  level  of  discretization  N  increases  to  infinity,  the  points 
constructed  approach  the  global  minimizer  of  the  original  semi-infinite  min-max-min 
problem  (SMXM),  due  to  Theorem  IV. 7. 

Theorem  IV. 12.  Suppose  that  Assumption  IV.  1  holds,  and  that  Algorithm  IV.  1  is 
applied  to  solve  (SMXM)  with  a  given  N  e  N,  and  it  generates  a  sequence  C 
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X  x  ZN .  If  w*  =  (x*,z*)  with  x*  £  X  and  z*  £  ZN ,  is  an  accumulation  point  of 
then  x*  is  a  global  minimizer  of  (SMXMat). 

Proof.  The  conclusion  follows  directly  from  Theorem  IV. 9.  □ 

D.  NUMERICAL  RESULTS 

In  this  section,  we  apply  our  approach  on  a  DAD  problem  with  a  ten-node  18- 
arc  network  as  shown  in  Figure  2.  The  problem  parameters,  e.g.,  ubsupplyu,  demandu, 
and  totalsorties  are  obtained  by  uniform  random  number  generators  based  on  bounds 
that  we  provide.  We  set  the  bounds  in  such  a  way  that  more  supply  can  be  placed 
at  nodes  1-5  than  6-10,  while  the  demands  are  higher  at  nodes  6-10  than  1-5.  This 
ensures  that  we  have  flow  from  the  left-hand  side  of  the  network  to  the  right-hand 
side.  We  refer  to  Appendix  E  for  the  problem  parameters  generated  and  used  in  this 
study.  We  use  a  discretization  level  of  N  =1,000,  i.e.,  we  consider  1,000  randomly- 
generated  attack  plans.  Each  attack  plan  provides  the  sorties  to  launch  against  the 
18  arcs. 

We  solve  the  constrained  finite  minimax  problem  constructed  in  our  approach 
by  reformulating  it  into  a  standard  nonlinear  constrained  problem  and  solving  it 
using  a  sequential  quadratic  program  (SQP)  algorithm.  We  implement  and  run  the 
algorithm  in  MATLAB  version  7.10  (R2010a)  (see  Mathworks,  2009)  on  a  3.46  GHz 
PC  with  two  quad-core  processors,  using  Windows  7  Pro,  with  24  GB  of  RAM.  We 
use  the  SQP  algorithm  in  TOMLAB  SNOPT  solver,  see  Gill  et  al.  (2007). 

For  our  problem  with  ten  nodes  and  18  arcs,  and  a  discretization  level  N  = 
1,  000,  (FMXtv)  has  1,000  functions  and  approximately  18,000  variables,  and  takes  ap¬ 
proximately  4.5  hours,  while  the  (GFMXn))  has  11,000  functions  and  approximately 
38,000  variables,  and  takes  approximately  1.5  hours.  The  smaller  (FMXat)  requires 
a  longer  run  time  because  there  are  more  nonlinear  components  in  its  formulation, 
where  the  balance  of  flow  and  capacity  constraints  have  been  modeled  as  nonlinear 
penalty  cost  terms  in  the  objective  function. 


121 


We  first  discuss  the  results  for  (SMXM).  We  refer  to  Figure  2  for  the  solutions 
obtained  from  solving  (FMXat).  The  optimal  supply  solution  and  the  required  de¬ 
mand  are  stated  on  the  nodes.  The  supply  numbers  highlighted  in  red  (specifically 
those  for  nodes  6-10)  indicate  that  the  proposed  supplies  are  at  their  ubsupplyu  val¬ 
ues.  The  worst-case  attack  plan  is  one  that  concentrates  attack  on  arcs  (4,6)  and 
(5,7),  see  the  details  on  this  attack  plan  in  Appendix  E.  This  worst-case  attack  plan 
is  reasonable  based  on  the  problem  parameters,  where  more  supply  can  be  placed  at 
those  nodes  on  the  left-hand  side  of  the  network,  while  higher  demands  are  required 
at  the  nodes  on  the  right-hand  side.  The  optimal  flow  after  the  worst-case  attack  is 
stated  on  the  arcs,  and  the  objective  function  value  is  1,288. 


SUPPLY. 


SUPPLYU 
(demand,, ) 


(demand,,) 

o"°"  o 


4.8  8.6  3.8  4.5 

Obj.  value  (2.9)  (2.5)  (8.2)  (8.3) 


The  proposed  supply  sums  up  to  46.0.  This  is  less  than  the  total  demand  of 
54.5.  We  develop  another  optimization  model,  which  we  refer  to  as  the  verification 
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model,  to  verify  if  the  solution  obtained  from  our  approach  is  reasonable,  and  to 
determine  why  the  proposed  supply  is  less  than  the  total  demand.  When  given  an 
arbitrary  SUPPLY,  the  verification  model  runs  through  all  1,000  attack  plans,  and  for 
each  attack  plan,  determines  the  optimal  flow  and  associated  objective  function  value. 
We  test  the  verification  model  with  several  alternative  supply  solutions  that  sum  to 
the  required  demand  of  54.5,  and  obtain  objective  function  values  no  smaller  than 
approximately  2,300.  We  conclude  that  the  proposed  supply  is  less  than  the  total 
demand  because  the  sorties  have  reduced  the  capacity  of  the  network  to  an  extent 
that  any  additional  supply  is  unable  to  flow  to  satisfy  any  outstanding  demand.  Thus, 
we  do  not  gain  any  benefit  by  adding  supply,  and  worse  still,  we  incur  the  additional 
cost  of  storing  supply  as  well  as  incur  penalty  for  balance  of  flow  constraint  violations 
as  the  additional  supply  cannot  flow  out. 

We  next  state  the  results  for  (GMXM).  We  refer  to  Figure  3  for  the  solutions 
obtained  from  solving  (GFMXjv).  The  proposed  supply  is  the  same  as  that  proposed 
for  (SMXM),  except  for  the  smaller  supply  placed  at  nodes  1,  4,  and  5.  The  optimal 
flow  after  the  worst-case  attack  is  stated  on  the  arcs,  and  the  objective  function  value 
is  891,  see  the  details  on  the  attack  plan  in  Appendix  E.  The  objective  function  value 
for  (GMXM)  is  significantly  different  from  that  of  (SMXM)  as  the  two  problems 
have  different  objective  functions.  The  worst-case  attack  and  the  optimal  flow  for 
(GMXM)  are  significantly  different  from  that  of  (SMXM). 

E.  CONCLUSIONS  FOR  SEMI-INFINITE  MIN-MAX-MIN 

This  chapter  focuses  on  the  semi-infinite  min-max-min  problem.  We  propose 
an  approach  that  constructs  a  finite  minimax  problem  with  a  larger  dimensionality 
than  the  original  min-max-min  problem,  through  discretization  and  reformulation  of 
the  original  problem.  Our  approach  is  the  first  to  solve  the  generalized  semi-infinite 
min-max-min  problem,  and  it  also  solves  the  semi-infinite  min-max-min  problem.  The 
numerical  results  show  that  the  approach  produces  reasonable  solutions. 
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SUPPL\  SUPPLYU 


(demandu)  (demandu 


FLOWnv 


4.8 

Obj.  value  (2.9) 


7.1  3.8  4.5 

(2.5)  (8.2)  (8.3) 


=  891 


Figure  3.  Optimal  Supply  and  Flow  Solution  for  (GMXM). 
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V.  CONCLUSIONS  AND  FUTURE 

RESEARCH 

A.  CONCLUSIONS 

Optimization  problems  with  uncertain  parameters  arise  in  numerous  applica¬ 
tions.  One  possible  approach  to  handle  such  problems  is  to  consider  the  worst-case 
value  of  the  uncertain  parameter  during  optimization.  We  consider  three  problems  re¬ 
sulting  from  this  approach:  a  finite  minimax  problem  (FMX),  a  semi- infinite  minimax 
problem  (SMX),  and  a  semi-infinite  min-max-min  problem  (MXM).  In  all  problems, 
we  consider  nonlinear  functions  with  continuous  variables.  We  develop  rate  of  conver¬ 
gence  and  complexity  results,  and  propose  algorithms  for  solving  these  optimization 
problems. 

We  develop  rate  of  convergence  and  complexity  results  of  smoothing  algorithms 
for  solving  (FMX)  with  many  functions.  We  find  that  smoothing  algorithms  may  only 
have  sublinear  rates  of  convergence,  but  their  complexity  in  the  number  of  functions  q 
is  O(qlogq),  as  compared  to  0(q3)  for  the  sequential  quadratic  programming  (SQP) 
algorithms,  which  our  numerical  results  as  well  as  those  in  the  literature  show  to 
be  one  of  the  fastest  for  solving  (FMX).  The  competitive  complexity  for  smoothing 
algorithms  is  due  to  its  small  computational  work  per  iteration.  We  present  two 
new  smoothing  algorithms  for  (FMX)  with  novel  precision- adjustment  schemes,  and 
show  that  they  are  competitive  with  other  algorithms  from  the  literature.  They  are 
especially  efficient  for  problems  with  many  variables,  or  where  a  significant  number 
of  functions  are  nearly  active  at  stationary  points.  The  new  algorithms  are  easy  to 
implement  and  do  not  require  any  QP  solver,  which  is  required  for  algorithms  such 
as  the  SQP  and  Pshenichnyi-Pironneau-Polak  (PPP)  minimax  algorithm.  One  of  our 
proposed  precision-adjustment  schemes  is  simpler  and  more  efficient  than  the  scheme 
used  in  the  existing  smoothing  algorithms,  which  provides  a  good  alternative  when 
developing  new  smoothing  algorithms.  Our  numerical  results  indicate  that  smoothing 
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with  first-order  gradient  methods  is  likely  the  only  viable  approach  to  solve  a  (FMX) 
with  a  large  number  q  of  functions  and  problem  dimensionality  d,  due  to  memory 
limitations.  The  SQP  and  PPP  algorithms  need  to  compute  and  provide  the  gradient 
information  (q  x  d  matrix)  to  the  QP  solver,  and  so  the  size  of  the  problem  that 
can  be  solved  is  limited  by  the  memory  required  to  store  the  q  x  d  matrix,  as  well 
as  the  memory  required  by  the  QP  solver  to  process  the  gradient  information.  In 
smoothing  algorithms,  we  do  not  require  the  memory  to  store  the  full  q  x  d  matrix,  as 
the  gradient  of  the  smoothed  function  can  be  constructed  by  sequentially  considering 
portions  of  the  gradient  matrix. 

For  (SMX),  we  develop  and  compare  rate  of  convergence  results  for  various 
fixed  and  adaptive  discretization  algorithms,  as  well  as  an  e-subgradient  algorithm. 
We  present  a  novel  way  of  expressing  rate  of  convergence,  in  terms  of  computational 
work  instead  of  the  typical  number  of  iterations,  which  we  use  throughout  the  anal¬ 
ysis  of  (SMX).  Hence,  we  are  able  to  identify  algorithms  that  are  competitive  due 
to  low  computational  work  per  iteration  even  if  they  require  many  iterations.  We 
show  that  to  solve  (SMX),  a  fixed  discretization  algorithm  with  quadratically  or  lin¬ 
early  convergent  algorithm  map  to  solve  the  discretized  problem  can  achieve  the  same 
asymptotic  convergence  rate  attained  by  an  adaptive  discretization  method.  Under 
certain  convexity-concavity  assumptions,  we  show  how  the  rate  of  convergence  for  dis¬ 
cretization  algorithms  depend  on  the  dimension  of  the  uncertain  parameters,  while 
e-subgradient  algorithms  do  not.  This  indicates  that,  under  convexity-concavity  as¬ 
sumptions,  discretization  algorithms  will  not  be  competitive  against  e-subgradient 
algorithms  for  moderate  to  large  dimension  of  the  uncertain  parameters.  Our  numer¬ 
ical  results  show  that  discretization  algorithms  are  not  competitive  to  e-subgradient 
algorithms  for  convex-concave  problems  with  a  dimension  of  the  uncertain  parameters 
as  small  as  two. 

We  propose  a  new  approach  to  solve  (MXM),  based  on  discretization  and  re¬ 
formulation  of  (MXM)  into  a  constrained  finite  minimax  problem  with  a  larger  dimen- 
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sionality  than  the  original  (MXM).  Our  approach  is  the  first  to  solve  (GMXM)  in  the 
literature,  and  it  also  solves  (SMXM).  We  apply  our  approach  to  a  defender-attacker- 
defender  network  interdiction  problem,  and  the  results  demonstrate  the  viability  of 
our  approach. 

B.  FUTURE  RESEARCH 

There  are  several  possibilities  for  extending  the  research  of  this  dissertation. 
The  two  smoothing  algorithms  developed  for  (FMX)  in  this  dissertation  produce  a 
working  set  that  is  monotonically  increasing.  The  efficiency  of  the  active-set  SQP 
algorithm  shows  the  potential  benefits  of  an  aggressive  active-set  strategy  that  keeps 
the  working  set  small.  However,  when  we  implement  the  active-set  strategy  from  the 
SQP  algorithm  in  our  smoothing  algorithms,  we  see  slower  run  times,  which  indicates 
that  some  kind  of  fine-tuning  on  the  active-set  strategy  is  probably  required.  Thus, 
an  extension  would  be  to  custom-fit  an  active-set  strategy  for  smoothing  algorithms. 

Another  opportunity  for  extension  concerns  the  precision-adjustment  scheme 
in  the  smoothing  algorithm  for  (FMX)  that  requires  user-specified  parameters.  It 
would  be  worthwhile  to  develop  procedures  for  rationally  selecting  these  parameters, 
as  it  is  difficult  for  users  to  come  up  with  good  choices  for  the  parameters. 

We  show  that  the  e-subgradient  algorithm  has  better  rate  of  convergence  for 
solving  (SMX)  than  discretization  algorithm.  However,  the  e-subgradient  algorithm 
requires  a  concavity  assumption  to  ensure  that  the  computational  work  to  obtain 
an  e-maximizer  (global  maximum)  for  the  uncertain  parameters  remains  bounded. 
Without  the  concavity  assumption,  it  would  be  interesting  to  see  how  the  rate  of 
convergence  results  for  other  algorithms  such  as  the  exchange  algorithms  compare  to 
discretization  algorithms,  since  exchange  algorithms  will  also  need  to  implement  some 
form  of  discretization  or  branch-and-bound  techniques  to  obtain  a  global  maximizer 
for  the  uncertain  parameters. 

Our  approach  to  solve  (MXM)  constructs  a  constrained  finite  minimax  problem 
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with  a  large  number  of  functions  and  variables.  The  constructed  problem  has  a 
special  structure,  each  function  depends  on  only  a  small  number  of  variables,  the 
same  number  as  the  sum  of  the  number  of  variables  in  the  innermost  and  outermost 
minimization  problem.  It  would  be  useful  to  develop  special  first-order  algorithms 
that  utilize  this  special  structure. 
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APPENDIX  A.  FINITE  MINIMAX  PROBLEMS 


Table  17  describes  the  problem  instances  used  for  the  numerical  studies  in 
Chapter  II.  Most  columns  are  self-explanatory.  Columns  2  and  3  give  the  number 
of  variables  d  and  functions  q,  respectively.  The  target  values  (Column  7)  are  equal 
to  the  optimal  values  (if  known)  or  a  slightly  adjusted  value  from  the  optimal  values 
reported  in  Polak  et  al.  (2003);  Zhou  and  Tits  (1996)  for  smaller  q.  The  same  target 
values  are  used  for  ProbA-ProbM  in  Tables  5  and  6. 

In  this  appendix,  we  denote  components  of  x  G  by  subscripts,  i.e.,  x  = 
(xi,x2,  ■~,Xd)  €  Md.  When  the  problem  is  given  in  semi-infinite  form,  as  in  (A. 2a)  - 
(A.2i),  the  set  Y  is  discretized  into  q  equally  spaced  points  if 

iJj(x)  =  ma  xcp(x,y),  (A.  la) 

y£Y 

and  q/ 2  equally  spaced  points  if 


i^(x) 


max 

yeY 


be: 


(A.  lb) 


ProbA  is  defined  by  (A. la)  and  (A. 2a),  and  ProbB-ProbI  by  (A. lb)  and  (A.2b)-(A.2i), 
respectively. 


=  (?y2  -  l)x  +  y(l  -y)(l  -x),  Y  =  [0,1], 

(A. 2a) 

y)  =  (1  -  y2)  -  (0.5a:2  -  2 yx),  Y  =  [-1, 1], 

(A. 2b) 

<j>(x,  y)  =  y2  -  (yxi  +  x2  exp (y)),  Y  =  [0,  2], 

(A. 2c) 

0(®,J/)=1+  xlexp(yx2),  1  =[  0.5, 0.5], 

(A. 2d) 

(j)(x,y)  =  sin y  -  (y2x 3  +  yx 2  +  xi),  Y  =  [0, 1], 

(A.2e) 

(f>(x,y)  =  exp(y)  ,  V  =  [0, 1], 

1  +  yx-i 

(A.2f) 
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4>{x,  y)  =  y/y-[x 4  -  ( y2Xi  +  yx2  +  X3)2],  Y  =  [0.25, 1],  (A.2g) 

1 

4>{x,y)  =  Y^~y  ~  kiexP(^3)  +  x2exp(yx4)],  Y  =  [-0.5,  0.5],  (A.2h) 

4>{x,y)  =  — ^ - [xiexp  (yx4)  +x2exp(yx5)  +x3exp  (yx6)\, 

1  +  y 

Y  =  [-0.5,  0.5],  (A.2i) 

ProbJ-ProbM  are  defined  by  i^(x)  =  ma p(x),  with  fHx)  as  in  (A.2j)-(A.2m), 
respectively. 

F(x)  =  x2,  j  =  {1, ...,  q},  (A.2j) 

P(x)  =  Xfj-1)2+1  +  xlj,  j  =  {!>  ->  ?}>  (A-2k) 

fj(x)  =  x2j_^A+1  +  x2i_1)4+2  +  X(J_1)4+3  +  xlj,  j  =  {1, ...,  q},  (A. 21) 

f3(x)  =  x%  +  x\i  J  =  j1,  2,  3,  Q  | ,  (A. 2m) 

where  (kj,  lj)  are  all  2-combinations  (see  Section  3.3  of  Brualdi  2004)  of  {1,  2,  3, ...,  d}, 
and 

F(x)  =  cijx2  +  bjXi  +  Cj ,  j  =  {1,  (A.2n) 

where  i  —  ^  ,  and  dj,bj ,  Cj  are  randomly  generated  from  a  uniform  distribution  on 
[0.5,1], 
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Tabic  17.  Finite  minimax  problem  instances.  An  asterisk  *  indicates  that  the  problem  instance  are  created  by  the 
authors. 
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APPENDIX  B.  FINITE  MINIMAX 
ALGORITHM  DETAILS  AND  PARAMETERS 


PPP.  Pshenichnyi-Pironneau-Polak  min-max  algorithm  (Algorithm  2.4.1  in  Polak 
1997)  use  a  =  0.5 , /3  =  0.8,  and  5  =  1.  We  use  the  same  Armijo  parameters  a  and  f3 
for  all  algorithms. 

e-PPP.  e- Active  PPP  algorithm  (Algorithm  2.4.34  in  Polak  (1997);  see  also  Polak 
2008)  use  the  same  parameters  as  above.  We  implement  the  most  recent  version  Polak 
(2008). 

SQP-2QP.  Sequential  Quadratic  Programming  with  two  QPs  in  each  iteration  (Al¬ 
gorithm  2.1  of  Zhou  &  Tits  1996)  use  parameters  recommended  in  Zhou  and  Tits 
(1996)  and  monotone  line  search.  (We  examined  the  use  of  nonmonotone  line  search 
in  CFSQP,  but  find  it  inferior  to  monotone  line  search  on  the  set  of  problem  instances.) 
SQP-1QP.  Sequential  Quadratic  Programming  with  one  QP  in  each  iteration  (Al¬ 
gorithm  A  in  Zhu  et  al.  2009)  use  mid-point  values  stated  in  Algorithm  A,  a  =  0.25 
(not  the  Armijo  parameter),  r  =  2.5,  and  H0  =  I.  The  same  settings  for  a  and  H0 
are  used  by  a  co-author  in  Zhu  and  Zhang  (2005). 

SMQN.  Smoothing  Quasi-Newton  algorithm  (Algorithm  3.2  in  Polak  et  al.  2008) 
use  p0  =  1,  B(-)  =  /,  and  Parameter  Adjustment  subroutine  version  “Case  (A)”  of 
Polak  et  al.  (2003). 

Algorithm  II. 2.  This  algorithm  uses  the  same  parameters  as  SMQN,  except  for  in 
the  Adaptive  Penalty  Parameter  Adjustment  subroutine,  where  it  uses  £  =  2,  =  2. 

Algorithm  II. 3.  This  algorithm  use  parameters  t  =  10 ~5,(p  =  l,Po  —  l,p  = 
(log  q/t)  •  1010,  k  =  1030,  f  =  2, 7  =  t  ■  10”10,  v  =  0.5,  A p  =  10. 
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APPENDIX  C.  SEMI-INFINITE  MINIMAX 

PROBLEMS 

In  this  appendix,  we  denote  components  of  x  G  and  y  G  Y  C  by 
subscripts,  for  example,  x  =  (xi,x2,  Two  problem  instances  from  Rustem  and 

Howe  (2002)  are  used  for  the  numerical  studies  in  Chapter  III.  The  problem  (SMX) 
to  be  solved  is  as  defined  by  (III.  1)  and  (III. 2),  with  < j>(x,y )  as  defined  below: 


2 


4>{x,y) 

=  -y2  +  x1(-y  +  5)  +x2(y  +  S), 

i= 1 

(C.la) 

4>{x,y) 

2  2 

=  5  JN-  -  +X1(-y1  +//2  +  5)  +X2(yi  -2/2  +  3), 

i=\  z=l 

(C.lb) 

<t>{x,y) 

=  -(xi  -  1)2/1  -  (x2  -  2)2/2  -  (^'3  -  1)2/3  +  2xf  +  3X2  +  xl 

0 

-  ±& 

(C.lc) 

i=  1 


The  second  (SProbB)  and  third  (SProbC)  problem  instances  are  Problems  1 
and  5  on  pp.  100-102  of  Rustem  and  Howe  (2002),  respectively.  SProbB  and  SProbC 
have  y- dimensionality  of  two  and  three  respectively.  There  are  no  problem  instances 
in  Rustem  and  Howe  (2002)  with  //-dimensionality  of  one.  We  create  SProbA  from 
SProbB  by  removing  y2  and  replacing  y\  by  y.  All  three  problem  instances  are  convex- 
concave,  i.e.,  (/>(•,  y)  is  convex  for  any  fixed  y  EY,  and  <f>(x,  •)  is  concave  for  any  fixed 
x  G  Md.  As  <j>(-,y)  is  convex  for  any  fixed  y,  a  subgradient  is  guaranteed  to  exist, 
which  is  a  pre-requisite  for  the  e-subgradient  algorithm,  Algorithm  III. 3.  As  4>(x,  •) 
is  strictly  concave  for  any  fixed  x,  there  exists  a  unique  //-maximizer  for  each  fixed  x. 

Table  18  provides  more  details  on  the  problem  instances.  Columns  2  and  3  give 
the  dimensions  of  the  solution  space  d  and  the  uncertain  parameter  m,  respectively. 
Columns  4  and  5  give  the  initial  points  to  the  solution  xq  and  the  uncertain  param¬ 
eter  y0 ,  respectively.  Note  that  yo  is  only  relevant  for  the  e-subgradient  algorithm, 
Algorithm  III. 3,  as  it  is  not  required  for  the  discretization  algorithms. 
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For  the  target  solutions  (last  column),  we  use  Algorithm  III. 3  (as  it  shows 
significantly  faster  run  times  than  the  discretization  algorithms  during  the  preliminary 
experiments)  with  parameters  as  in  Appendix  D,  we  start  with  a  stepsize  of  a  = 
0.1  and  run  the  algorithm  until  the  solution  remains  unchanged  for  more  than  ten 
iterations,  we  use  this  solution  to  warm-start  the  next  stage  where  we  decrease  the 
stepsize  to  a  =  0.01  and  repeat  the  process  until  a  =  10~5.  We  do  not  use  the  optimal 
solutions  as  reported  in  Rustem  and  Howe  (2002)  as  Rustem  and  Howe  (2002)  uses 
a  different  termination  criteria.  The  optimal  solutions  obtained  with  our  procedure 
agree  with  those  reported  in  Rustem  and  Howe  (2002)  at  least  to  the  fourth  decimal 
place. 

The  other  columns  are  self-explanatory. 
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APPENDIX  D.  SEMI-INFINITE  MINIMAX 
ALGORITHM  DETAILS  AND  PARAMETERS 


Algorithm  III.  1  (applies  to  both  e-PPP  and  SQP-2QP).  Given  a  discretization 
parameter  N  e  N  ( | YN | ) ,  the  discretization  scheme  discretize  each  dimension  of  y  into 
N  1/m  equally  spaced  points,  which  gives  a  total  of  N  grid  points.  For  the  numerical 
studies,  N  are  chosen  such  that  lV1//m  are  integers. 

Algorithm  III.  1  with  e-PPP.  The  e-PPP  algorithm  is  the  same  algorithm  used  in 
Chapter  II.  The  same  Armijo  parameters  a  =  0.5 ,/3  =  0.8,  and  <5  =  1  are  used.  The 
e  parameter  for  determining  the  active  set  is  set  at  10~3,  which  is  the  value  used  for 
the  algorithm  comparison  in  Chapter  II. 

Algorithm  III.  1  with  SQP-2QP.  The  SQP-2QP  algorithm  is  the  same  algorithm 
used  in  Chapter  II.  Similar  to  Chapter  II,  we  use  the  parameters  recommended  in 
Zhou  and  Tits  (1996)  and  monotone  line  search.  The  e  parameter  for  determining  the 
active  set  is  set  at  1,  which  is  the  value  used  for  the  algorithm  comparison  in  Chapter 

II. 

Algorithm  III. 3.  Our  preliminary  numerical  tests  show  very  fast  run  times  for 
Algorithm  III. 3  as  compared  to  the  other  two  discretization  algorithms.  Thus,  we 
spent  minimal  effort  in  sensitivity  analyses  to  fine-tune  the  algorithm  parameters. 
We  use  a  constant  stepsize  a  =  0.1,  which  the  preliminary  tests  show  to  be  robust. 
For  Step  2  of  Algorithm  III. 3,  we  use  TOMLAB  SNOPT  with  its  default  tolerances 
to  find  the  ^/-maximizer,  and  the  final  y  iterate  from  the  previous  major  iteration  is 
used  to  warm-start  the  search  for  the  ^/-maximizer  in  the  current  major  iteration. 
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APPENDIX  E.  SEMI-INFINITE  MIN-MAX-MIN 
PROBLEM  PARAMETERS  AND  RESULTS 


We  generate  the  data  for  the  defender-attacker-defender  (DAD)  network  using 
Table  19,  which  are  used  for  both  (SMXM)  and  (GMXM).  Once  the  problem  data  are 
generated,  see  Table  20,  we  hold  them  fixed  for  the  analyses.  For  data  e,  penbal,  and 
pencap,  which  are  not  data  from  the  problem  but  are  required  for  the  formulation, 
we  do  not  randomly  generate  their  values.  Instead,  we  fix  their  values,  e  =  0.001, 
penbal=  50,  and  pencap—  50. 


Data 

Random  generators 

costsupply 

5x11(2,4),  5x11(10,13) 

costflow 

U(l,ll) 

demand 

5xU(2,3),  5xU(8,10) 

totalsorties 

U(20,22) 

ubsupply 

U(10,12) 

ubflow 

U(5,15) 

vul 

U(0.5,2) 

vulcap 

0.6  x  ubflow 

Table  19.  Random  generators  used  to  produce  the  DAD  problem  parameters  in  Table 
20.  The  phrase  5xU(2,4),  5xU(10,13)  represents  that  a  total  of  ten  random  numbers 
are  generated,  the  first  five  are  uniformly  distributed  between  two  and  four,  and  the 
last  five  numbers  are  uniformly  distributed  between  ten  and  13. 

We  generate  1,000  random  attack  plans,  each  attack  plan  has  total  sorties  over 
the  18  arcs  no  greater  than  20.7  sorties.  Specifically,  we  generate  18  random  numbers, 
each  U (0,20. 7/4).  If  the  sum  of  the  18  numbers  is  no  greater  than  20.7,  we  accept  the 
set  as  an  attack  plan.  We  repeat  until  we  accumulate  1,000  attack  plans.  The  factor 
“4”  in  “20.7/4”  is  chosen  empirically. 

An  initial  point  wq  =  (0.5  x  ubsupply,  0.5  x  ubflow ,  0.5  x  ubflow, ...)  is  used  for 
(SMXM).  An  initial  point  w0  =  (0.5  x  ubsupply  ,0.5  x  ubflow,  0.5  x  ubflow,.  ..0.5  x 
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ubflow,  0, 0, 0)  is  used  for  (GMXM),  where  the  string  of  0’s  is  for  EXCESSSUPPLY 
and  EXTRASUPPLY. 

Based  on  the  1,000  attack  plans,  the  (SMXM)  solution  obtained  from  solving 
(FMX^v)  is  shown  in  Table  21,  which  states  (i)  the  optimal  supply  distribution  plan 
before  the  attack,  (ii)  the  worst-case  attack  plan,  and  (iii)  the  optimal  flow  after  the 
worst-case  attack.  The  objective  function  value  for  this  solution  is  1,288. 

For  (GMXM),  the  solution  is  shown  in  Table  22,  with  an  objective  function 
value  of  891.  Note  that  the  objective  functions  for  (SMXM)  and  (GMXM)  are  differ¬ 
ent,  which  explains  the  significantly  different  objective  function  values. 
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Table  22.  Results  for  (GMXM). 
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